[ODE] Faster ODE

Niko Nevatie nnevatie at welho.com
Thu Nov 21 22:16:01 2002


I benchmarked 'test_ldlt', here are the results.

Configuration:
- AMD Athlon TB 800MHz, 384MB RAM, Windows XP
- ODE 0.03 built with Borland C++ Builder 6.0 (all optimizations on)

Test:
- ODE was built including first the original 'fastldlt.c' and then
'fastldlt_henri.c'
- test_ldlt was on executed using all available parameters (f, s, t)

Results:

with 'fastldlt.c':
----
3449

error = 1.625478e-03, size = 71
error = 2.011657e-04, size = 79
error = 4.785806e-04, size = 83
error = 5.344188e-02, size = 89
error = 3.189385e-03, size = 97
error = 2.305180e-03, size = 101
75

error = 4.673339e-04, size = 71
error = 2.476573e-04, size = 73
error = 1.307763e-03, size = 79
error = 1.248479e-03, size = 83
error = 1.030391e-02, size = 89
error = 1.046956e-03, size = 97
error = 7.226467e-04, size = 101
89
----


with 'fastldlt_henri.c':
----
2046

error = 1.625478e-03, size = 71
error = 2.011657e-04, size = 79
error = 4.785806e-04, size = 83
error = 5.344188e-02, size = 89
error = 3.189385e-03, size = 97
error = 2.305180e-03, size = 101
75

error = 4.673339e-04, size = 71
error = 2.476573e-04, size = 73
error = 1.307763e-03, size = 79
error = 1.248479e-03, size = 83
error = 1.030391e-02, size = 89
error = 1.046956e-03, size = 97
error = 7.226467e-04, size = 101
89
----


Conclusions:
- The outputs of the tests are identical.
- 'fastldlt_henri.c' consumed ~59% of the time taken by 'fastldlt.c', on the
described test configuration.
- As mentioned earlier, the results may vary depending on the CPU and cache
types.


Cheers

----- Original Message -----
From: "Peter Amstutz" <tetron@interreality.org>
To: "Daniel Duhprey" <duhprey@yahoo.com>
Cc: <ode@q12.org>
Sent: Thursday, November 21, 2002 11:59 PM
Subject: Re: [ODE] Faster ODE


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I haven't looked any of the code involved, but there is one possible
> explanation for it being faster on one CPU and slower on another :-)  It
> sounds like a cache-size issue.  If the Athlon has a bigger L1/L2 cache
> then the algorithm might be faster overall but if uses more instructions
> or space than the current algorithm it could be too big and incur a large
> cache-miss penalty on the Celeron (which if I recall correctly was given a
> really wimpy L1/L2 cache to keep costs down).  Such are the joys of modern
> CPU architechtures...  There's been some work on "cache-oblivious"
> algorithms, which is essentially a technique of designing the algorithm to
> work on small, localized and usually recursive subsets of the total
> problem (this is especially useful for divide-and-conquor algorithms) so
> that once you have a subproblem which fits in cache, solving that
> subproblem is efficient without having to know a priori the size of the
> processor cache.
>
> Just something to think about -- as I said, I haven't looked at the code
> involved so I could completely off base :-)
>
> On Thu, 21 Nov 2002, Daniel Duhprey wrote:
>
> > On Thu, 21 Nov 2002, Henri Hakl wrote:
> >
> > -->Please check the accuracy and speed using the testsuite provided with
ODE.
> >
> > If I'm using the numbers from the test_ldlt correctly (as a raw time on
> > some scale) then on my athlon its about 38% faster and on my celeron its
> > roughly twice as slow :).
>
> [   Peter Amstutz   ][ amstutz@cs.umass.edu ][ tetron@interreality.org  ]
> [Lead Programmer][Interreality Project][Virtual Reality for the Internet]
> [ VOS: Next Generation Internet Communication][ http://interreality.org ]
> [ http://interreality.org/~tetron ][ pgpkey:  pgpkeys.mit.edu  18C21DF7 ]
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.7 (GNU/Linux)
>
> iD8DBQE93VdXaeHUyhjCHfcRAiRMAJ9DdwekLZIYJk0n/fAjtcd3aDG0vACfRze6
> mcOSJYkj8/NzQXeW/qoia+k=
> =SJZQ
> -----END PGP SIGNATURE-----
>
>
> _______________________________________________
> ODE mailing list
> ODE@q12.org
> http://q12.org/mailman/listinfo/ode
>