[ODE] Some SSE in Quick step

Shaul Kedem shaul_kedem at yahoo.com
Sun May 30 04:12:16 MST 2004


Joakim,
 Are you basically saying the D3D is the least
efficient one?

 Strange, considering so much have been discussed
about it...

Shaul

--- Joakim Eriksson <jme at snowcode.com> wrote:
> > > >  It's certainly not doing anything like:
> > > >
> > > > If(SSE2)
> > > > {
> > > > 	SSE2MatrixMultiply();
> > > > }
> > > > Else if(SSE)
> > > > {
> > > > 	SSEMatrixMultiply();
> > > > }
> > > > Else if...
> > > >
> > > > if that's what you're getting at.
> > > No. Read what i've said again. "jumptable
> chachacha" has nothing to
> do
> > > with the code you've posted.
> > So you know that D3DX is doing this
> mysteriously-named "jumptable
> > chachacha"?
> 
> What D3DX does is to have a jump table. At the start
> of a program they
> all point to the same 'Setup' function. So whatever
> you call first will
> get into that setup function. The setup function
> then setups the real
> function pointers depending on what cpu you have. 
> 
> It's a clever system because it's clean and you dont
> have to do any
> 'if(SSE)' at each call. However you get the access
> to this jump table so
> only large functions can get optimized using this
> system and even they
> will get a perfomance penalty. This is the main
> reson why you see that
> so few functions are actually optimized. The rest
> are placed in the
> standard D3DX header files.
> 
> Just to show some perfomance figures from our engine
> here (Do keep in
> mind that this is a syntechic performance tests so
> it might not be
> completetly accurate. The main loop is just
> performing the operation and
> then cycling the diffrent matrices to get a more
> real result)
> 
>
------------------------+----------+-----------------+-----------------+
> Function                | Original |    
> SSE(Speedup)|     D3D(Speedup)|
> m3.Identity()           |     15.4 |     9.2 ( 1.67)
> |    25.3 ( 0.61) |
> 
> m3 *= m3                |    167.5 |    70.1 ( 2.39)
> |   289.4 ( 0.58) |
> 
> m3 = m3 * m3            |    131.4 |    81.5 ( 1.61)
> |   345.6 ( 0.38) |
> m4.Identity()           |     25.2 |    12.1 ( 2.07)
> |    25.2 ( 1.00) |
> 
> m4 *= m4                |    272.8 |   128.1 ( 2.13)
> |   286.2 ( 0.95) |
> 
> m4 = m4 * m4            |    249.8 |   137.9 ( 1.81)
> |   336.0 ( 0.74) |
> Transpose()             |     77.2 |    25.6 ( 3.02)
> |    27.9 ( 2.77) |
>
------------------------+----------+-----------------+-----------------+
> 
> All time are cycle times.
> Original - Our unoptimized code. Inlined
> SSE      - Our optimized SSE code. Inlined
> D3D      - D3DX functions. Some are inlined some use
> the jump table.
> 
> Cheers
>  Joakim E. - http://www.snowcode.com
> 
> 
> 
> _______________________________________________
> ODE mailing list
> ODE at q12.org
> http://q12.org/mailman/listinfo/ode



	
		
__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


More information about the ODE mailing list