[ODE] Choleski factorization

Antonio_Martini@scee.net Antonio_Martini at scee.net
Tue Mar 25 10:00:02 2003


hi is this what you have implemented?
where "solve LCP problem for A" is is now an LDLT factorization
if you loop "foreach joint" effectively your are dealing with a single
constraint at time.

from a your previous post:

foreach joint
 get Jacobian info (J) and other LCP vectors.
next joint
for i = 0 to maxIterations
 foreach body
  compute global I and invI matrices from mass.I and rotation
  add Inertia to torque and force accumulators
  add gravity to force accumulators
 next body
 foreach joint
  calculate A = J * invM * J'  //30% of time here
  solve LCP problem for A   //20% of time here
  add resulting forces to each body's accumulators
 next joint
 foreach body
  adjust body's linear and angular velocity by force and torque accumulator
  adjust body's position and rotation by linear and angular velocity
  reset force and torque accumulators to 0
 next body
next i




<david@csworkbench.com>@q12.org on 25/03/2003 16:35:08

Sent by:    ode-admin@q12.org


To:    <ode@q12.org>
cc:
Subject:    RE: [ODE] Choleski factorization


That particular email is a few days old.  The LCP solver has been replaced
with a simpler algorithm:  Solve the system of equations using a standard
LDLT factorization and backsolve routine, for all joints outside the lcp
limits (lo and hi), clamp them to lo or hi.  Then you update the
factorization (only O(n^2) instead of O(n^3) to form the factorization) to
remove the row and column that were just solved for, and re-solve the new
smaller system.  More than twice as fast as direct LCP for the small
systems of single joints.  I had thought about doing the single constraint
at a time idea, but I have a feeling it would increase the number of
iterations required for a good solution to the point that it's not
helpful.  I might try a version along those lines at some point and see
how it acts, but I'm really pretty happy about how it's acting.  I'm also
almost positive that going to single constraints will make the system even
more sensitive to high mass ratios, forces, and velocities.

David

> ..just to be more precise i meant that if we split the problem so that
> each constraint is a separate problem as some else suggested in
> a previous post we will no need any direct LCP solver for the solution
> of each single LCP subproblem in the iterative method. I might
> not have been quite clear but do not have much time to write sorry.
> However the main reference is the Brian Mirtich's Phd.
>
> ---------------------- Forwarded by Antonio Martini/CAMD/UK/SCEE on
> 25/03/2003 16:11 ---------------------------
>
>
> Antonio Martini
> 25/03/2003 16:12
>
> To:    ode@q12.org
> cc:
>
> Subject:    RE: [ODE] Choleski factorization  (Document link: Antonio
>        Martini)
>
> still on the iterative solution, i remember from the work of Mirtich
> that if we have only one constraint(one contact normal)
> the problem is reduced to the solution of only a linear system. So no
> sub LCP problem must be solved, but just a linear
> set of equations for each single constraint. I tried it end i end up
> with something like f = fabs(a/m), im simulationg friction
> using a method described by Shoemer, otherwise for more accurate static
> friction the scalar equation will became
> a linear system of 3 equations in 3 unknowns.
>
>
>
>
> Sergio Valverde <svalverde@barcelona.ubisoft.es>@q12.org on 25/03/2003
> 15:58:12
>
> Sent by:    ode-admin@q12.org
>
>
> To:    david@csworkbench.com
> cc:    ode@q12.org
> Subject:    RE: [ODE] Choleski factorization
>
>
> Very good job, David!
>
> I can't wait to play with your new dWorldStepFast()!
>
> Why don't you  post the corresponding five files
> while the final (and "clean") version is
> integrated within ODE?
>
> Sergi
>
> -----Original Message-----
> From: david@csworkbench.com [mailto:david@csworkbench.com]
> Sent: domingo, 23 de marzo de 2003 9:33
> To: ode@q12.org
> Subject: Re: [ODE] Choleski factorization
>
>
> I know this is a long message, so I'll summarize my questions here: -
> Should I release the iterated solution in the same files as their
> current equivalents or separate them out?
> - After the iterated solution is done, would you rather see SIMD
> extensions or a verlet integrator for particles (cloth, water, etc)
> next? - If SIMD (slightly ahead of the verlet integrator in my mind),
> then what's the best way to get cross-platform/compiler assembly into
> the project?
> - Plus a few internals questions below.
>
> And now the long version.....
>
> I'm not sure how close what I came up with is to PP, but it's working
> great!  I switched over to LDLT and got immediate improvements from just
> that switch, I'm guessing because of all the loop unrolling and not
> having to do Sqrt's.  Then I clamped x's that are outside of lo and hi
> (calculated dynamically if findex > -1), remove their rows and columns
> from A with a slightly modified dRemoveLDLT (A didn't use row pointers),
> update rhs, and resolve until no more constraints go out of range
> (usually only once), and voila! a 150% speed increase over dSolveLCP
> (and should give the exact same results 95% of the time).  I know this
> algorithm will most likely slow to a crawl with larger systems where it
> may have to resolve several times, but it works great when it doesn't,
> and since I'm doing only one joint at a time, it will usually be either
> impossible or really weird to do more than one resolve.  One thing I
> need to do is change the contact joint so the hard contact constraint is
> at the end of J instead of the beginning, since it seems more likely
> that a contact would be limiting penetration rather than friction, and
> limits at the end are a lot easier to remove than limits at the
> beginning.  Would this do anything to mess up the current algorithm?
>
> Now the major bottleneck (takes 4 times as long as any other operation)
> is computing A.  I did find an optimization for that, but I was
> surprised it wasn't already implemented, since it would help the old
> version too.  So I wanted to ask about it to see if I should make this a
> change to the original function or make a new one.  Here it is:
>
> Correct me if I'm wrong, but I believe A is always symmetric (as well as
> positive definite).  This means that A's top right triangle mirrors it's
> lower left, and possibly (I don't know about this one) every block has
> mirrored top and bottom triangles...?  However, in the step where A is
> calculated (JinvM * J') a call is made to Multiply2_p8r for each block.
> Multiply2_p8r calculates every entry in the block, top and bottom half,
> separately. Shouldn't it calculate the top half and copy the results to
> the bottom half as it goes (or when it's done)? I know that would work
> for the diagonal blocks in the current matrix, but I'm not sure about
> the others.  Should I change the current function, or write another
> version that exploits the symmetry, and change the current algorithm to
> use that one for the diagonal?  The real question is whether every block
> in A is symmetric or only the ones along the diagonal.
>
> Another question: how do you want this integrated into ODE?  Right now I
> have it integrated into the current source files, so you just replace 4
> or 5 files and you can start calling dWorldStepFast(world, stepsize,
> iterations).  I've made a few changes to the body class, to store it's
> inertial tensor and Mass matrix in the global frame.  That way I don't
> have to recalculate it for each joint that is connected to it, only once
> each iteration.  So that file will have to be replaced (or at least
> those 2 lines).  Other than that, I can take the rest of the code out
> and put it in separate files (and figure out where to change the
> makefile) if you'd rather it be separated.  Just let me know.
>
> I guess my next bought of hacking will be SIMD :)  What's the most
> cross-platform/compiler way to get assembly into the project?  I guess I
> could just write it once for gcc inline and once for vc++ inline, but
> that leaves out the compilers I don't have or care to learn.  So what
> does that leave, GAS and NASM?  Anyone have a preference (I'll have to
> learn either one.... which is "best")?
>
> Then I could do a verlet integrator.... anybody interested in cloth and
> other particle effects (seen some cool water demos)?  There's only two
> constraints: distance (also known as stick constraints) and the angle
> between 2 sticks.  It shouldn't be too difficult to implement them as
> inequality constraints.  The integrator and the collision detection are
> really integrated, so a simloop would call normal collision and step
> routines, then the particle engine, and the particles would "flow"
> around the rest of the bodies.  There's a paper on the Hitman game and
> an article on Gammasutra that explain the concept better, I'll find the
> links if you're interested.  The basic concept is that you just make a
> particle at each of the vertices of a mesh, and then the mesh would act
> like cloth and drape over anything it came in contact with, or you could
> make it
> weightless and move one particle up and down to make a rippling water
> effect.  You could attach particles to a point on a rigid body.  The
> hard part is going to be getting the rigid bodies to interact with the
> particles (i.e. if you have a couple boxes sitting on a table cloth,
> they should move with it if you start dragging the cloth around).  The
> algorithm itself is an iterated one, but the less general constraints
> make the computations required to integrate a step much faster.
>
> So which do y'all want first, more optimization or more functionality?
>
> Thanks for all the help,
> David
>
>>
>> hi,
>>
>> it sounds like you're getting close to the "principal pivoting"
>> algorithm for LCP solving. the PP algorithm works like this:
>>
> <snip>
>>
>>> My question is, is a Cholesky Factorization also a factorization for
>>> all of its sub-matrices?
>>
>> if you have the factorization A=L*L', then you also have the
>> factorizations A(1:n,1:n)=L(1:n,1:n)*L(1:n,1:n)' for all n (in matlab
>> notation). other submatrix factorizations can be gotten incrementally,
>> see the ODE source for an incremantal factorization updater.
>>
>> russ.
>>
>> --
>> Russell Smith
>> http://www.q12.org
>> _______________________________________________
>> ODE mailing list
>> ODE@q12.org
>> http://q12.org/mailman/listinfo/ode
>
>
>
> _______________________________________________
> ODE mailing list
> ODE@q12.org
> http://q12.org/mailman/listinfo/ode
> _______________________________________________
> ODE mailing list
> ODE@q12.org
> http://q12.org/mailman/listinfo/ode
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Antonio Martini
> Sony Computer Entertainment Europe
> http://www.scee.com
>
>
>
>
> **********************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error please notify
> postmaster@scee.net
>
> This footnote also confirms that this email message has been checked for
> all known viruses.
>
> **********************************************************************
>  SCEE 2003
>
> _______________________________________________
> ODE mailing list
> ODE@q12.org
> http://q12.org/mailman/listinfo/ode



_______________________________________________
ODE mailing list
ODE@q12.org
http://q12.org/mailman/listinfo/ode










**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
postmaster@scee.net

This footnote also confirms that this email message has been checked
for all known viruses.

**********************************************************************
 SCEE 2003