Stupid multithreaded benchmarks

Dan Lyke ran a stupid benchmark [1]—just incrementing a variable to a billion, one just flat out, and one between some pthreads locking primitives.

I wanted to see what stupid results I would get if I used spin locks [2]. First, the relevant bit of code (used nasm [3] to compile it under a 2.6GHz (gigaHertz) dual-core Pentium running Linux 2.6):

>

```

bits 32

global t1

global t2

section .data

gv dd 0

glock dd 0

section .text

align 16

t1: mov eax,[gv]

inc eax

mov [gv],eax

cmp eax,1000000000

jl t1

; ret

; the following implements _exit()

; for the multithreaded version

xor ebx,ebx

mov eax,252

int $80

mov eax,1

int $80

hlt

align 16

t2: mov al,1

t2.wait: xchg al,[glock]

or al,al

jne t2.wait

t2.here: mov eax,[gv]

inc eax

mov [gv],eax

mov byte [glock],0

cmp eax,1000000000

jl t2

;ret

xor ebx,ebx

mov eax,252

int $80

mov eax,1

int $80

hlt

```

Straightforward implementations here. t1() is the straight through counting routine, while t2() is the one with the spin lock. Running single threaded yielded these results:

Table: Counting, single threaded
routine	time to execute
------------------------------
t1()	2.454s
t2()	39.752s

While I expected the spin lock to be faster than the pthread locking, I wasn't expecting it to be this slow. But maybe, just maybe, I'll get some of that speed back by running dual threads. At the very least, it should be a bit faster than single core, right?

Right?

Bueller? [4] Bueller? [5]

Table: Counting, dual-threaded
routine	time to execute
------------------------------
t1()	0m10.334s
t2()	2m31.307s

Um …

Wow.

I didn't expect spinlocks to be so expensive.

Ouch.

[1] http://www.flutterby.com/archives/comments/10723.html

[2] /boston/1999/12/08.2

[3] http://nasm.sourceforge.net/

[4] http://www.imdb.com/title/tt0091042/

[5] http://www.idiotsavant.com/ftp/sounds/bueller.wav

Gemini Mention this post

Contact the author