[ODE] Report: performances, OPCODE, x86-64, GCC 4.0, etc.

Tue Apr 5 11:15:06 MST 2005

Hi All,

Before committing the x86-64 patch for OPCODE, I wanted to test it a bit
and compare the performances against a regular 32 bits build.

The test setup: Dual Opteron 240 (1.4 GHz) 1 GB RAM, Debian unstable
(the official x86 32 bits, and the unofficial AMD64 port).

On Debian 32 bits, I always used the package libode-dev that comes
natively with Debian. Both Debians were using GCC 3.3.

For OPCODE I used an articulated robot arm and a truck model (roughly
2000 triangles) colliding together, using 1000 iterations of ODE per
frame (in order to remove the influence of the graphic card).

Results:

32 bits: 4.2 fps
64 bits: 2.2 fps

Preliminary conclusions: the problem that was mentionned by the author
of OPCODE was that using 64 bits integers (instead of 32 bits one) to
hold pointer values (note: this is what the aforementioned patch does)
would slow down OPCODE performances. For these reason, his opinion was
that the patch should not be applied to the main OPCODE code (the
official one, not the ode one).
Apparently, these first results are proving him right.

To further verify these results, I tested just ODE itself with a rope
model composed of 100 spheres: the x86-64 build was about 20 % faster.
Suggesting again that the problem is lying within the OPCODDE patch.

Because the patch allows ODE/OPCODE to be compiled on x86-64, there is
no previous version of ODE/OPCODE I can test it against on x86-64. So
objectively, there is no way to be sure the performance loss is caused
by the OPCODE x86-64 modification. Afterall, it could be caused by the
immaturity of GCC on x86-64 platforms.

Hence, I decided to redo the tests but this time using GCC 4.0 (snapshot
26/03/2005) for the 64 bits build. Further down the road, it got back to
me that OPCODE was being compiled with -O1 instead of -O2 because of
code generation issues; reminding me that something might be rotten
there too.

So here are the new results:

32 bits GCC 3.3: 4.2 fps
64 bits GCC 3.3: 2.2 fps
64 bits GCC 4.0: 7.0 fps

I think it's clear that the main performance problem with OPCODE on
Linux comes from the compiler.
The performance hit (if it even exists) coming from using 64 bits
integers instead of 32 bits one is getting insignificant in the light of
these tests.

Now, last step, was to see whether -O2 still generates bad results with
OPCODE using GCC 4.0.

...

Well it does :-(

Considering that between GCC 3.3 and GCC 4.0 the C++ front end has been
completely rewritten, and that the optimizing back end has also been
completely rewritten (using the famous ssa-tree); something bad is going
on here.

Has anyone an idea why GCC is generating bad code for OPCODE?
Is it a bug?
from GCC?
or from OPCODE?
Or is it a GCC optimization being too agressive on float math?

When I get some time, I'll try to pin down the exact optimization flag
that causes this bad code generation, and see if I can get more
performance out of OPCODE with GCC.

PS: some test examples of ODE do not compile with GCC 4.0 on x86-64 
because pointer-to-integer conversions are considered to be errors now 
(previously they only generated warnings).

Regards,

Tanguy