[ODE] Record and Playback

Steve Baker sjbaker1 at airmail.net
Tue Jan 27 17:55:00 MST 2004


Martin C. Martin wrote:
> Marty Rabens wrote:
> 
>>> Why?  All Intel CPUs and clones need to have 100% same behaviour, down
>>> to the roundoff of the floating point.  RTS games typically only send
>>> user commands between computers, and rely on everything happening in
>>> lockstep.

Most games don't rely on things happening in lockstep.  The reason is that
they want to work over the Internet - and they want to work when one computer
is a 1GHz CPU with a GeForce-2 and the other is a 3.4Ghz with a GeoForce-FX.

If every computer had to wait for every other so they could all stay in
lock-step, the game would run at 1Hz frame rate.

>> I've done a little research, and it turns out that fp operations on
>> different CPUs are IEEE compliant, but the IEEE standard allows CPUs to
>> have differences in the precision of internal representation.

Plus (under Windows at least) you can switch the rounding mode of the
FPU - and some OpenGL drivers do indeed do that...so you may not get
the same results on two otherwise identical computers that are running
with different graphics cards!

> On the other hand, compilers offer options to save the results of all FP 
> calculations to RAM, which forces every computation to its minimum 
> number of bits, so that you don't get this.  In gcc it's called 
> "--float-store," in MSVC it's called "improve floating point consistency."

Fine if you don't mind crappy floating point performance.  But in any case,
precision can vary between compilers and revisions of compilers because
order of operations can change under optimisation.   This may not be a
problem if you have a binary only distribution of your game that only
runs on one platform...but it's a dangerous situation.

> In practice, relying on FP reproducibility & lockstep between machines 
> makes for a lot of headaches.  Everything must depend only on state 
> that's shared between the two machines.  Our network programmer spent 
> untold weeks (months?) sifting through logs, finding places where things 
> got out of sync.  You have to be very careful about using "float store," 
> and the fp precision/rounding modes, and saying the system requirements 
> are "Intel CPUs and 100% compatibles" rather than "Intel and AMD CPUs," 
> and going at the speed of the slowest machine, etc.  But as far as I 
> know, all RTSes do it, because you simply can't replicate the position & 
> orientation of 100 units to 3 other computers over a 56k phone line. Not 
> to mention that you'd open yourself up to all kinds of cheating.

Absolutely they DO NOT *all* rely on lockstep replication.

Many rely on a central server.  All of the important calculations can
be done there and broadcast to the client machines.  The clients may be
doing some of those calculations too - in order to keep the system running
smoothly you can't wait for a S-L-O-W network connection.  The idea is to
'fill in' data until an authoritative update comes from the server.

Whilst determinism isn't guaranteed, you don't get enough 'drift' over
the short periods between network updates to cause serious problems.

Since the server is authoritative, any glitch can only last as long as
a network update.

---------------------------- Steve Baker -------------------------
HomeEmail: <sjbaker1 at airmail.net>    WorkEmail: <sjbaker at link.com>
HomePage : http://www.sjbaker.org
Projects : http://plib.sf.net    http://tuxaqfh.sf.net
            http://tuxkart.sf.net http://prettypoly.sf.net
-----BEGIN GEEK CODE BLOCK-----
GCS d-- s:+ a+ C++++$ UL+++$ P--- L++++$ E--- W+++ N o+ K? w--- !O M-
V-- PS++ PE- Y-- PGP-- t+ 5 X R+++ tv b++ DI++ D G+ e++ h--(-) r+++ y++++
-----END GEEK CODE BLOCK-----



More information about the ODE mailing list