Yet even more stupid benchmarks

Yet another silly optimization problem [1]. This time, from a silly coding challenge [2] to find the number of integers expressible with unique digits (that is, no single digit repeats) in a base-10 representation up to the value 10,000,000,000 (there are 8,877,690 such numbers, by the way).

The neatest and fastest solution was final program on this page [3], written in C#. It generates only such numbers; it doesn't try to test each number. Since I don't use C#, I decided to translate the code in C to play around with it. Wasn't all that hard:

>
```
#include <stdio.h>
#include <stdlib.h>
int total = 0;
const int pre[(1 << 10) + 1] /* = { ... } */ ;
void generate2(
int maxlen,
int currentlen,
int availabledigits,
int currentvalue
)
{
int last = (currentlen == maxlen - 1);
int x = availabledigits;
while(x != 0)
{
int digit = pre[x ^ (x & (x - 1))];
x &= (x - 1);
if (digit == 0 && currentvalue == 0)
continue;
if (last)
++total;
else
generate2(
maxlen,
currentlen + 1,
availabledigits & ~(1 << digit),
(currentvalue * 10) + digit
);
}
}
int main(int argc,char *argv[])
{
int len;
for (len = 1 ; len <= 10 ; len++)
generate2(len,0,0xFFF >> 2,0);
printf("total: %d\n",total);
return EXIT_SUCCESS;
}
```

I pregenerated the pre[] array since I wanted this to run as fast as possible. The code used to generate the array:

>
```
for (i = 0 ; i <= 10 ; i++)
pre[1 << i] = i;
```

Anyway, once written and compiled (gcc -O4 -fomit-frame-pointer f.c) it ran in about 0.2 seconds (average run) on a 2.6GHz (gigaHertz) machine. Fast, but I could go faster by running it across the two CPU (Central Processing Unit)s in the box. I was expecting about half the runtime, since this is easily parallelizable.

It ran in about 0.16 seconds, a rather disappointing ¾ time. I commented out the code in generate2() just to test the overhead of threading and syncronization and that isn't a factor (program ran in 0.001 seconds).

Undaunted, I decided to try one of the quad-core boxes at The Office. Reworked the code a bit to split the load between four CPUs as evenly as possible, and ran some tests.

0.13 seconds on average. Still not quite half the speed.

Hmmm …

[1] /boston/2008/08/07.2

[2] http://beust.com/weblog/archives/000491.html

[3] http://www.indiangeek.net/2008/08/29/a-case-study-in-micro-optimization/

Gemini Mention this post

Contact the author