Dan Lyke ran a stupid benchmark [1]—just incrementing a variable to a billion, one just flat out, and one between some pthreads locking primitives.
I wanted to see what stupid results I would get if I used spin locks [2]. First, the relevant bit of code (used nasm [3] to compile it under a 2.6GHz (gigaHertz) dual-core Pentium running Linux 2.6):
>
```
bits 32
global t1
global t2
section .data
gv dd 0
glock dd 0
section .text
align 16
t1: mov eax,[gv]
inc eax
mov [gv],eax
cmp eax,1000000000
jl t1
; ret
; the following implements _exit()
; for the multithreaded version
xor ebx,ebx
mov eax,252
int $80
mov eax,1
int $80
hlt
align 16
t2: mov al,1
t2.wait: xchg al,[glock]
or al,al
jne t2.wait
t2.here: mov eax,[gv]
inc eax
mov [gv],eax
mov byte [glock],0
cmp eax,1000000000
jl t2
;ret
xor ebx,ebx
mov eax,252
int $80
mov eax,1
int $80
hlt
```
Straightforward implementations here. t1() is the straight through counting routine, while t2() is the one with the spin lock. Running single threaded yielded these results:
Table: Counting, single threaded routine time to execute ------------------------------ t1() 2.454s t2() 39.752s
While I expected the spin lock to be faster than the pthread locking, I wasn't expecting it to be this slow. But maybe, just maybe, I'll get some of that speed back by running dual threads. At the very least, it should be a bit faster than single core, right?
Right?
Bueller? [4] Bueller? [5]
Table: Counting, dual-threaded routine time to execute ------------------------------ t1() 0m10.334s t2() 2m31.307s
Um …
Wow.
I didn't expect spinlocks to be so expensive.
Ouch.
[1] http://www.flutterby.com/archives/comments/10723.html
[3] http://nasm.sourceforge.net/
[4] http://www.imdb.com/title/tt0091042/