Stupid multithreaded benchmarks

Dan Lyke ran a stupid benchmark [1]—just incrementing a variable to a billion, one just flat out, and one between some pthreads locking primitives.

I wanted to see what stupid results I would get if I used spin locks [2]. First, the relevant bit of code (used nasm [3] to compile it under a 2.6GHz (gigaHertz) dual-core Pentium running Linux 2.6):

>
```
bits 32
global t1
global t2
section .data
gv dd 0
glock dd 0
section .text
align 16
t1: mov eax,[gv]
inc eax
mov [gv],eax
cmp eax,1000000000
jl t1
; ret
; the following implements _exit()
; for the multithreaded version
xor ebx,ebx
mov eax,252
int $80
mov eax,1
int $80
hlt
align 16
t2: mov al,1
t2.wait: xchg al,[glock]
or al,al
jne t2.wait
t2.here: mov eax,[gv]
inc eax
mov [gv],eax
mov byte [glock],0
cmp eax,1000000000
jl t2
;ret
xor ebx,ebx
mov eax,252
int $80
mov eax,1
int $80
hlt
```

Straightforward implementations here. t1() is the straight through counting routine, while t2() is the one with the spin lock. Running single threaded yielded these results:

Table: Counting, single threaded
routine	time to execute
------------------------------
t1()	2.454s
t2()	39.752s

While I expected the spin lock to be faster than the pthread locking, I wasn't expecting it to be this slow. But maybe, just maybe, I'll get some of that speed back by running dual threads. At the very least, it should be a bit faster than single core, right?

Right?

Bueller? [4] Bueller? [5]

Table: Counting, dual-threaded
routine	time to execute
------------------------------
t1()	0m10.334s
t2()	2m31.307s

Um …

Wow.

I didn't expect spinlocks to be so expensive.

Ouch.

[1] http://www.flutterby.com/archives/comments/10723.html

[2] /boston/1999/12/08.2

[3] http://nasm.sourceforge.net/

[4] http://www.imdb.com/title/tt0091042/

[5] http://www.idiotsavant.com/ftp/sounds/bueller.wav

Gemini Mention this post

Contact the author