Of course it's slower, but I didn't expect it to be quite that bad

Time for another useless µbenchmark! This time, the overhead of trapping integer overflow!

So, inspired by this post about trapping integer overflow [1], I thought it might be interesting to see how bad the overhead is of using the x86 [2] instruction INTO [3] to catch integer overflow. To do this, I'm using DynASM [4] to generate code from an expression that uses INTO after every operation. There are other ways of doing this, but the simplist way is to use INTO. I'm also using 16-bit operations, as the numbers involved (between -32,768 and 32,767) are reasonable (for a human) to deal with (unlike the 32-bit range -2,147,483,648 to 2147483647 or the insane 64-bit range of -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).

The one surprising result was that Linux treats the INTO trap as a segfault! Even requesting additional information (passing the SA_SIGINFO flag with sigaction()) doesn't tell you anything. But that in itself tells you it's not a real segfault, as a real segfault will report a memory mapping error. Personally, I would have expected a floating point fault, even though it's not a floating point operation, because on Linux, integer division by 0 results in floating point fault (and oddly enough, a floating point division by 0 results in ∞ but no fault)!

But, aside from that, some results. I basically run the expression one million times and simply record how long it takes. The first is just setting a variable to a fixed value (and the “- 0” bit is there just to ensure an overflow check is included):

Table: x = 1 - 0
overflow	time	expression result
------------------------------
true	0.009080000	1
false	0.006820000	1

Okay, not terribly bad. But how about a longer expression? (and remember, the expresssion isn't optimized)

Table: x = 1 + 1 + 1 + 1 + 1 + 1 * 100 / 13
overflow	time	expression result
------------------------------
true	0.079528000	46
false	0.030125000	46

Yikes! (But this is also including the function call overhead). For the curious, the last example compiled down to:

>
```
xor eax,eax
mov ax,1
add ax,1
into
add ax,1
into
add ax,1
into
add ax,1
into
add ax,1
into
imul 100
into
mov bx,13
cwd
idiv bx
into
mov [$0804f50E],ax
ret
```

The non-overflow version just had the INTO instructions missing—otherwise it was the same code.

I think what's surprising the most here is that the INTO instruction just checks the overflow flag and only if set does it cause a trap. The timings I have (and I'll admit, the figures I have are old and for the 80486) show that INTO only has a three-cycle overhead if not taken. I'm guessing things are worse with the newer multipipelined multiscalar multiprocessor monstrosities we use these days.

Next I'll have to try using the JO instruction [5] and see how well that fares.

[1] http://blog.regehr.org/archives/1154

[2] https://en.wikipedia.org/wiki/X86

[3] http://x86.renejeschke.de/html/file_module_x86_id_142.html

[4] /boston/2015/09/05.1

[5] /boston/2015/09/07.1

Gemini Mention this post

Contact the author