[ODE] Sweep-and-Prune space for ODE
Pierre Terdiman
pierre.terdiman at novodex.com
Fri Nov 26 16:10:25 MST 2004
> Okay, now THAT is a nice implementation of radix sort. For floats, it
> is 2.3 times faster than std::sort! That means it's 5 times faster
> than Pierre's code. Five. I would strongly suggest that anyone using
> Pierre's code should switch to the stereopsis code.
BTW, if you really want relevant numbers you should make sure all versions
do the same things. For example the memory buffers for the radix sort are
allocated outside the sort routine in Michael's code snippet, while it's
done within the RadixSort class in mine (that is, it's usually profiled
too!). Allocating memory for 50,000,000 elements might take some significant
time, for example.
The prefetch instructions is another thing you might want to consider. As
far as I'm concerned, I didn't want to use them in sake of compatibility
with non-SSE machines (at the time, I didn't even have one). So, *of course*
taking advantage of this gives the 25% speedup Michael mentions. But there's
a price to pay for this.
Over the years, I've been sent quite a few "faster" sort routines by
different people. The most common mistake is the problem I already mentioned
: not providing ranks as output. This is quite a significant speed boost !
Adding ranks immediately makes you touch twice as much memory, and it's
often enough to double the CPU time as well.
Overall I found that radix sort to be quite ok, often faster than the
alternatives.
- Pierre
More information about the ODE
mailing list