[ODE] PosR - a better way?

Thu Nov 11 14:22:51 MST 2004

> Actually, the whole PosR structure is one aspect I dislike about ode.  
> If geoms kept their PosR structure it would be easy to have geoms offset 
> from their bodies (rather than needing the extra GeomTransform), and it 
> would also save on allocs.  In summary, these are my thoughts on it:

But you'd have to copy the matrix around, every single time, and 
your cache traffic would be much higher. Paying the cost of the 
GeomTransform for the things that actually offset is, to me, a 
valid trade-off to gain the cache, storage, and copy winnings of 
sharing the matrix.

Allocating matrices should be very efficient with most allocators. 
If you really want to use a dumb malloc, then perhaps the best way 
to go is to wrap ODE allocations in a shell where you can write a 
custom block-slicing allocator, or whatever.

Also not that if a geom has a local offset, it's local to the body, 
so you have to do a matrix multiply to get the world-space position, 
just like the GeomTransform does. Meanwhile, for the common case 
that the geom is identical to the body (typical for characters, or 
those brave enough to use trimesh/trimesh), there's significant 
computational savings in not having to multiply by an identity 
matrix.

> - In particular the terrain's ray test has a PosR alloc and free every 
> single time its called, which can be multiple times per frame. 

That could easily be avoided.

> - There is an alloc/free every time a geom is attached or detached to a 
> body.

This typically happens only on entity creation time, which typically 
is very seldom. Getting a cache traffic winnings every world step is 
a much bigger win. If you create geoms a lot, then a custom allocator 
would be necessary anyway (because the geom itself is also an allocation).

> - Its by and far the most common alloc I have in the game. 

Great, so you know what to optimize, and how to optimize it -- 
assuming your allocs are even close to being a problem, that is. I've 
found that, unless you're degenerate, they seldom are.

> - For previous projects we actually had to write a custom PosR 
> alloc/free handler to pull these things off a static buffer to avoid 
> fragmentation.

I would be happy with a patch for ODE that pulled everything out of 
a custom allocator, to avoid fragmentation and better manage memory. 

> 2.) Keeping a local PosR per geom may take up more memory. 
> - This structure could be reduced to 3 floats for position and 4 floats 
> for a quat.

At a significant additional computational cost. Memories aren't 
slow enough, yet, to make that trade-off worth it -- I've tried.

> - Local data would save on cache misses.

But storing the same data in multiple places increases cache load and 
decreases residency. Typically, you run collision first, which pulls 
the matrix into the cache; then when you step, it's still there.

> - If we allowed every geom to have a transform from its body it may be 
> slower (an early out may avoid this).

Mis-predicted branches cost 40 cycles. Better to just always do the 
add, if that's the way you want to go. However, it's still a multiply-
add, because the local offset has to be in local space to be any good.

> - Basically the current limitation sucks.  GeomTransform is clunky.

That's a matter of opinion. You're giving some reasons for why you think 
so, and I'm giving some reasons for why I think Russ made a pretty good 
choice.

Cheers,

			/ h+