[ODE] Some SSE in Quick step

GARY VANSICKLE g.r.vansickle at worldnet.att.net
Sun May 30 23:30:13 MST 2004


> What D3DX does is to have a jump table. At the start of a program they
> all point to the same 'Setup' function. So whatever you call first will
> get into that setup function. The setup function then setups the real
> function pointers depending on what cpu you have.
> 

Same way they used to do FP-coprocessor emulation, that's what I figured.

> It's a clever system because it's clean and you dont have to do any
> 'if(SSE)' at each call. However you get the access to this jump table so
> only large functions can get optimized using this system and even they
> will get a perfomance penalty. This is the main reson why you see that
> so few functions are actually optimized. The rest are placed in the
> standard D3DX header files.
> 
> Just to show some perfomance figures from our engine here (Do keep in
> mind that this is a syntechic performance tests so it might not be
> completetly accurate. The main loop is just performing the operation and
> then cycling the diffrent matrices to get a more real result)
> 
> ------------------------+----------+-----------------+-----------------+
> Function                | Original |     SSE(Speedup)|     D3D(Speedup)|
> m3.Identity()           |     15.4 |     9.2 ( 1.67) |    25.3 ( 0.61) |
> 
> m3 *= m3                |    167.5 |    70.1 ( 2.39) |   289.4 ( 0.58) |
> 
> m3 = m3 * m3            |    131.4 |    81.5 ( 1.61) |   345.6 ( 0.38) |
> m4.Identity()           |     25.2 |    12.1 ( 2.07) |    25.2 ( 1.00) |
> 
> m4 *= m4                |    272.8 |   128.1 ( 2.13) |   286.2 ( 0.95) |
> 
> m4 = m4 * m4            |    249.8 |   137.9 ( 1.81) |   336.0 ( 0.74) |
> Transpose()             |     77.2 |    25.6 ( 3.02) |    27.9 ( 2.77) |
> ------------------------+----------+-----------------+-----------------+
> 
> All time are cycle times.
> Original - Our unoptimized code. Inlined
> SSE      - Our optimized SSE code. Inlined
> D3D      - D3DX functions. Some are inlined some use the jump table.
> 
> Cheers
>  Joakim E. - http://www.snowcode.com

Do you have this test code still around?  One thing that's hard for me to
figure out here is why your number for m4.Identity() is the same speed as
your unoptimized code, while for m3.Identity() it's 61% of your unoptimized
code.  Which version of D3D is this BTW?

-- 
Gary R. Van Sickle



More information about the ODE mailing list