I've now implemented the binary search version; I was right, the factor 10 was about correct, and even with 20 mio buckets the algorithm is still faster than the FPGA and more than 3 times faster than his faster software algorithm even though he's got a newer CPU (I'm running it on a Core 2 duo (single-threaded) at 2.5 Ghz), and again, he's really using something between about 1000..100000 buckets, which would make my program on my machine between 17 to 32 times faster than his:
buckets gcc -O1 gcc -O3
1000 75 ms 72 ms
10000 105 ms 102 ms
100000 135 ms 136 ms
1000000 243 ms 240 ms
10000000 580 ms
20000000 693 ms
(Results in my previous post were with gcc -O1.)
All results so far were with random numbers with an uneven distribution: vec[i]= (i / len) * random_between(0.,1.). Here are the results for the binary search with vec[i]= random_between(0.,1.):
buckets gcc -O1
1000 82 ms
10000 115 ms
100000 148 ms
1000000 360 ms
10000000 735 ms
20000000 860 ms
(He's using a Gaussian distribution, not even.)
Here's the code:
int nvals= ..;
double* vals= ..;
int nbuckets= ..;
double* buckets= ..;
unsigned int* res= ..;
int i;
for (i=0; i<nvals; i++) {
double val= vals[i];
int bi;
{
int lo=0, hi=nbuckets;
while (lo < hi) {
int mid= (lo+hi) / 2;
if (val < buckets[mid]) {
hi= mid;
} else {
lo= mid+1;
}
}
bi= lo;
}
res[bi]++;
}
All results so far were with random numbers with an uneven distribution: vec[i]= (i / len) * random_between(0.,1.). Here are the results for the binary search with vec[i]= random_between(0.,1.):
(He's using a Gaussian distribution, not even.)Here's the code:
(Embedded within a Scheme driver program: http://christianjaeger.ch/scratch/binning.scm)