[ODE] Faster ODE

Henri Hakl henri at cs.sun.ac.za
Fri Nov 22 06:34:02 2002


hmmm... the results are quite interesting.

I can understand that in the case of Nguyen Binh there is no difference in
resultant speed. This is likely to be due to the compiler that is in that
case intelligent enough to produce all the optimizations I've set to it. In
essence I just realized that there are alot of code redundancies that aren't
guaranteed to be compiled away with optimal efficiency by most compilers.

VS.NET has a pretty thorough compiler as far as I understand... ;)

I have no idea why the Celeron results should be slower - but the 38-59%
speed improvement by Niko and Daniel are what I expect from average systems.

I'm quite happy with the results upto now then... ;)
  Henri


----- Original Message -----
From: "Niko Nevatie" <nnevatie@welho.com>
To: <ode@q12.org>
Sent: Friday, November 22, 2002 7:15 AM
Subject: Re: [ODE] Faster ODE


> I benchmarked 'test_ldlt', here are the results.
>
> Configuration:
> - AMD Athlon TB 800MHz, 384MB RAM, Windows XP
> - ODE 0.03 built with Borland C++ Builder 6.0 (all optimizations on)
>
> Test:
> - ODE was built including first the original 'fastldlt.c' and then
> 'fastldlt_henri.c'
> - test_ldlt was on executed using all available parameters (f, s, t)
>
> Results:
>
> with 'fastldlt.c':
> ----
> 3449
>
> error = 1.625478e-03, size = 71
> error = 2.011657e-04, size = 79
> error = 4.785806e-04, size = 83
> error = 5.344188e-02, size = 89
> error = 3.189385e-03, size = 97
> error = 2.305180e-03, size = 101
> 75
>
> error = 4.673339e-04, size = 71
> error = 2.476573e-04, size = 73
> error = 1.307763e-03, size = 79
> error = 1.248479e-03, size = 83
> error = 1.030391e-02, size = 89
> error = 1.046956e-03, size = 97
> error = 7.226467e-04, size = 101
> 89
> ----
>
>
> with 'fastldlt_henri.c':
> ----
> 2046
>
> error = 1.625478e-03, size = 71
> error = 2.011657e-04, size = 79
> error = 4.785806e-04, size = 83
> error = 5.344188e-02, size = 89
> error = 3.189385e-03, size = 97
> error = 2.305180e-03, size = 101
> 75
>
> error = 4.673339e-04, size = 71
> error = 2.476573e-04, size = 73
> error = 1.307763e-03, size = 79
> error = 1.248479e-03, size = 83
> error = 1.030391e-02, size = 89
> error = 1.046956e-03, size = 97
> error = 7.226467e-04, size = 101
> 89
> ----
>
>
> Conclusions:
> - The outputs of the tests are identical.
> - 'fastldlt_henri.c' consumed ~59% of the time taken by 'fastldlt.c', on
the
> described test configuration.
> - As mentioned earlier, the results may vary depending on the CPU and
cache
> types.
>
>
> Cheers
>
> ----- Original Message -----
> From: "Peter Amstutz" <tetron@interreality.org>
> To: "Daniel Duhprey" <duhprey@yahoo.com>
> Cc: <ode@q12.org>
> Sent: Thursday, November 21, 2002 11:59 PM
> Subject: Re: [ODE] Faster ODE
>
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > I haven't looked any of the code involved, but there is one possible
> > explanation for it being faster on one CPU and slower on another :-)  It
> > sounds like a cache-size issue.  If the Athlon has a bigger L1/L2 cache
> > then the algorithm might be faster overall but if uses more instructions
> > or space than the current algorithm it could be too big and incur a
large
> > cache-miss penalty on the Celeron (which if I recall correctly was given
a
> > really wimpy L1/L2 cache to keep costs down).  Such are the joys of
modern
> > CPU architechtures...  There's been some work on "cache-oblivious"
> > algorithms, which is essentially a technique of designing the algorithm
to
> > work on small, localized and usually recursive subsets of the total
> > problem (this is especially useful for divide-and-conquor algorithms) so
> > that once you have a subproblem which fits in cache, solving that
> > subproblem is efficient without having to know a priori the size of the
> > processor cache.
> >
> > Just something to think about -- as I said, I haven't looked at the code
> > involved so I could completely off base :-)
> >
> > On Thu, 21 Nov 2002, Daniel Duhprey wrote:
> >
> > > On Thu, 21 Nov 2002, Henri Hakl wrote:
> > >
> > > -->Please check the accuracy and speed using the testsuite provided
with
> ODE.
> > >
> > > If I'm using the numbers from the test_ldlt correctly (as a raw time
on
> > > some scale) then on my athlon its about 38% faster and on my celeron
its
> > > roughly twice as slow :).
> >
> > [   Peter Amstutz   ][ amstutz@cs.umass.edu ][
etron@interreality.org  ]
> > [Lead Programmer][Interreality Project][Virtual Reality for the
Internet]
> > [ VOS: Next Generation Internet Communication][
http://interreality.org ]
> > [ http://interreality.org/~tetron ][ pgpkey:  pgpkeys.mit.edu
 18C21DF7 ]
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.0.7 (GNU/Linux)
> >
> > iD8DBQE93VdXaeHUyhjCHfcRAiRMAJ9DdwekLZIYJk0n/fAjtcd3aDG0vACfRze6
> > mcOSJYkj8/NzQXeW/qoia+k=
> > =SJZQ
> > -----END PGP SIGNATURE-----
> >
> >
> > _______________________________________________
> > ODE mailing list
> > ODE@q12.org
> > http://q12.org/mailman/listinfo/ode
> >
>
> _______________________________________________
> ODE mailing list
> ODE@q12.org
> http://q12.org/mailman/listinfo/ode