[ODE] Some SSE in Quick step
Nguyen Binh
ngbinh at glassegg.com
Tue May 25 10:57:50 MST 2004
Hi Russ,
RS> now, i'm assuming that these numbers are "time to compute physics per
RS> frame". so the SSE version is actually slower in most cases?
Yes, you are right! Profiling the code show that SSE code is
slower. The problem is J and iMJ is not 4 bytes aligned so I have
to use __mm_set_ps() intrinsic which is not efficient. I had
modify fc slightly so that fc is 4 bytes aligned but change
J to 4 bytes aligned is not so easy...
I'll investigating this case...
--
Best regards,
---------------------------------------------------------------------
Nguyen Binh
Software Engineer
Glass Egg Digital Media
E.Town Building
7th Floor, 364 CongHoa Street
Tan Binh District,
HoChiMinh City,
VietNam,
Phone : +84 8 8109018
Fax : +84 8 8109013
www.glassegg.com
---------------------------------------------------------------------
More information about the ODE
mailing list