The other interesting thing to think about: at what point do we beat the same algorithm in C, and how hard would it be to parallelise the algorithm in C with pthreads …
“Use those extra cores and beat C today! (Parallel Haskell redux) [1]”
Since I'm doing stupid benchmarks anyway [2], why not this? I didn't use pthreads though—I haven't used that particular API (Application Programming Interface) in about a decade, and even then, I wasn't thrilled with it.
Nope. Instead I used a Linux specific call—clone(), wrote a small function to wait for all the threads that were created, and basically spent about five minutes making a parallelized C-version [3] of the Fibonacci sequence [4]. Granted, I had to do the parallelization explicitly, but it wasn't that difficult to do.
The hard part was finding a quad-core box to test this on. Fortunately, I know we have one at The Company (shhh—don't tell Smirk) and that's where I spent most of my time on this—locating our single quad-core machine I could test this on.
But find it I did. And yes, the program does run faster the more cores that are thrown at it, but it's not a linear speed up:
Table: A Parallelized Fibonacci Sequence calculation program # cores Runtime ------------------------------ 1 0m31.468s 2 0m19.521s 4 0m17.234s
It's interesting that it appears to be bottoming out rather quickly (and these are average times over a few runs).
[1] http://cgi.cse.unsw.edu.au/~dons/blog/2007/11/29#smoking-4core
[3] /boston/2007/11/29/parfib.c