[ODE] [Fwd: Re: Ode And Threads]

Wed Oct 3 07:32:14 MST 2007

Here is two mails from Patrik Enoch could be a good start about 
multi-threads in ODE framework.

Cheers.
MZ

This might be a good place to share my expericence with multi-
threading ODE:

Right now I the collisions are running in parallel in ODE without any
changes in ODE itself.
I am using the near-callback. Here is how:

- create "worker threads" (I have about 20). Basically, they are the
nearcallback function waiting on a multithreaded-shared-message queue
for the two object IDs
- a near-callback stub, being called from dSpaceCollide, that sends
the objectIDs to the queue
- after dCollide we need to wait until the message-queue is empty

It looks like this:

ODE_do_collide
{
	// create  worker threads
	for (i=0;i<20;i++) create_thread( ODE_nearcallback );

	// create msgq
	create msgq;

	// call AABB collider
	dSpaceCollide(world, ODE_nearcallback_stub);

	// wait for empty q
	while (!msgq.isempty()) {};

	// send "kill" message
	for (i=0;i<20;i++) msgq.sendmsg( 0,0 );
}

ODE_nearcallback_stub( o1, o2 )
{
	// can collide at all??
	if ( can_collide(o1,o2) )
	{
		// send "collide" message
		msgq.sendmsg( o1, o2 );
	}
}

ODE_nearcallback()
{
	while (1)
	{
		objectID o1,o2;
		msgq.getmessage( o1,o2 );

		if (o1==0 && o2==0)
		{
			suicide;
		}

		global_lock_acquire( o1, o2 );
		dCollide(o1,o2);
		global_lock_release( o1, o2 );

		global_lock_acquire( world);
		// create contacts
		global_lock_release(world);
	}
}

You see, you need a lot of workers, because the messages sent to the
queue will have lots of objects in common, e.g.

- collide A,B (received by thread 1)
- collide A,C (received by thread 2, has to wait until A,B is done)
- ...
- collide D,E (received by thread N, can proceed right away)

You need to lock the world, because only 1 thread can add contacts to
the world at a time.

If you want to use trimeshes, there is extrawork, because the
colliders use static data (for speedup purposes). So you could only
collide 1 trimesh, never parallel in its current version of ODE.

The collision time is about halved on my dual-core. I guess the
indentation i messed up when I send the above code to the list.

---

my next plan is to parallelize the handling of the islands. the data
structures do not overlap, which is great. however, this cannot be
done without changing ODE (like a process_islands_callback).

best,
Patrick

Hi,

I cannot give away my sources, but these snippets should help you. I
assume you use windows.

Create lots of threads with (as many threads as there are processors
works best, otherwise Windows will 'choke')...
	dat->taskid = CreateThread( NULL, THREAD_STACKSIZE, _threadstarter,
(void*)dat, 0, &dat->threadID );

...that start this function: Make sure to pass the world and
everything that is important in the dat!
static DWORD WINAPI _threadstarter( LPVOID param )
{
	_threaddata*dat = (_threaddata*)param;
	while( notdead )
	{
		get_message(&data,&o1,&o2);
		if ( collide_message )
		{
			collide();
			lock_world();
			add_contacts();
			unlock_world();
		}
		if ( barrier_message )
		{
			wait_on_collision_done_barrier();
		}
	}
	return;	// returning will kill this thread
}

A lock is a semaphore with inicount = 1. Childprocesses must (be able
to) inherit the handle.
Create a lock for the world, so that ONLY ONE thread can add contacts
each time!
	// child processes inherit handle
	SECURITY_DESCRIPTOR secdesc;
	InitializeSecurityDescriptor( &secdesc, SECURITY_DESCRIPTOR_REVISION );
	SECURITY_ATTRIBUTES sec;
	sec.nLength = sizeof(SECURITY_ATTRIBUTES);
	sec.lpSecurityDescriptor = &secdesc;
	sec.bInheritHandle = TRUE;	
	HANDLE lock=CreateSemaphore( &sec, inicount, 0x7FFFFFFF, 0 );

The near callback is just a stub that send messages to the threads:
static void nifty_ode_nearCallback (void *data, dGeomID o1, dGeomID o2)
{
	send_collide_message(data,o1,o2);
}

... that is called during the main collide function
dSpaceCollide();
send_all_threads_a_barrier_message();
wait_on_collision_done_barrier();	// wait for all threads to empty
the messageq and finish colliding BEFORE YOU CONTINUE

Check the internet for sources for "barriers". Those things make
everybody halt until everybody else has arrived at the barrier. I
suggest you use the pthread library for windows. Then you can recycle
all the resources you can find for "pthread" and "barrier" on the web.

Msgq are just arrays where I add/remove the last entry:

sendmessage:
	lock_wait( msgq->lck );
	msgq->msgs[msgq->countmsg].m1 = msg;
	msgq->msgs[msgq->countmsg].m2 = msg2;
	msgq->msgs[msgq->countmsg].m3 = msg3;
	msgq->countmsg++;
	lock_release( msgq->lck );
	semaphore_post( msgq->sem_msgavail );

getmsg:
	if (!semaphore_wait( msgq->sem_msgavail, timeout ))
		return false;	// no message avail
	lock_wait( msgq->lck );
	msgq->countmsg--;
	*msg = msgq->msgs[msgq->countmsg].m1;
	*msg2 = msgq->msgs[msgq->countmsg].m2;
	*msg3 = msgq->msgs[msgq->countmsg].m3;
	lock_release( msgq->lck );

Make the array very large, or use c++ vector<> class.

What objects are colliding? Some colliders use static variables, you
need to patch the sources then.

Memory allocation is a delicate topic apparently when one thread
frees data that another thread has allocated. You might have to
redirect the "new" and "delete" to
#if MEM_REPLACE_NEW
void *operator new(_CSTD::size_t size) throw(_STD::bad_alloc)
{
	return malloc( size );
}

void *operator new[](_CSTD::size_t size) throw(_STD::bad_alloc)
{
	return operator new( size );
}

void operator delete(void *z) throw()
{
	free( z );
}

void operator delete[](void *z) throw()
{
	operator delete( z );
}
#endif // MEM_REPLACE_NEW
which will give you a speed-penalty.