Time for another useless µbenchmark! This time, the overhead of trapping integer overflow!
So, inspired by this post about trapping integer overflow [1], I thought it might be interesting to see how bad the overhead is of using the x86 [2] instruction INTO [3] to catch integer overflow. To do this, I'm using DynASM [4] to generate code from an expression that uses INTO after every operation. There are other ways of doing this, but the simplist way is to use INTO. I'm also using 16-bit operations, as the numbers involved (between -32,768 and 32,767) are reasonable (for a human) to deal with (unlike the 32-bit range -2,147,483,648 to 2147483647 or the insane 64-bit range of -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).
The one surprising result was that Linux treats the INTO trap as a segfault! Even requesting additional information (passing the SA_SIGINFO flag with sigaction()) doesn't tell you anything. But that in itself tells you it's not a real segfault, as a real segfault will report a memory mapping error. Personally, I would have expected a floating point fault, even though it's not a floating point operation, because on Linux, integer division by 0 results in floating point fault (and oddly enough, a floating point division by 0 results in ∞ but no fault)!
But, aside from that, some results. I basically run the expression one million times and simply record how long it takes. The first is just setting a variable to a fixed value (and the “- 0” bit is there just to ensure an overflow check is included):
Table: x = 1 - 0 overflow time expression result ------------------------------ true 0.009080000 1 false 0.006820000 1
Okay, not terribly bad. But how about a longer expression? (and remember, the expresssion isn't optimized)
Table: x = 1 + 1 + 1 + 1 + 1 + 1 * 100 / 13 overflow time expression result ------------------------------ true 0.079528000 46 false 0.030125000 46
Yikes! (But this is also including the function call overhead). For the curious, the last example compiled down to:
>
```
xor eax,eax
mov ax,1
add ax,1
into
add ax,1
into
add ax,1
into
add ax,1
into
add ax,1
into
imul 100
into
mov bx,13
cwd
idiv bx
into
mov [$0804f50E],ax
ret
```
The non-overflow version just had the INTO instructions missing—otherwise it was the same code.
I think what's surprising the most here is that the INTO instruction just checks the overflow flag and only if set does it cause a trap. The timings I have (and I'll admit, the figures I have are old and for the 80486) show that INTO only has a three-cycle overhead if not taken. I'm guessing things are worse with the newer multipipelined multiscalar multiprocessor monstrosities we use these days.
Next I'll have to try using the JO instruction [5] and see how well that fares.
[1] http://blog.regehr.org/archives/1154
[2] https://en.wikipedia.org/wiki/X86
[3] http://x86.renejeschke.de/html/file_module_x86_id_142.html