Hi!
I've been working on implementing XPBD inside houdini and so far I've got a prototype up and running which looks promising. I'm using the inlinecpp python module to compile and run the code.
Here is one of my latest tests:
https://youtu.be/xoi51a6bkkw [youtu.be]
I want to optimize this as much as possible but I think I've ran into some memory bandwidth issues( naive guess ). 90% of the collision solving and constraints can be solved in parallel, but when I implement multithreading I see little to no performance gains.
As this is my first try at C++ and the HDK in general I'm probably doing something wrong.
The code itself is quite long but here is a small summary of what might be the issue:
First I fetch all the entities onto a vector< entity > e, this is basically where I store each rigid body object. For each object I store 30+ attributes, position, velocity, mass, inertia tensor etc. I can then easily iterate over the entities later in the code by referring to the entities.
// fech point attributes GA_RWHandleV3D p_h( gdp->findAttribute(GA_ATTRIB_POINT, "p")); GA_RWHandleV3D px_h(gdp->findAttribute(GA_ATTRIB_POINT, "px")); GA_RWHandleV3D v_h( gdp->findAttribute(GA_ATTRIB_POINT, "v")); GA_RWHandleV3D vx_h(gdp->findAttribute(GA_ATTRIB_POINT, "vx")); GA_RWHandleV3D w_h( gdp->findAttribute(GA_ATTRIB_POINT, "w")); GA_RWHandleV3D wx_h(gdp->findAttribute(GA_ATTRIB_POINT, "wx")); // create entities const int entities = gdp->getNumPoints(); std::vector< entity > e; e.resize(entities); for (int i=0; i<entities; i++){ e[i].p = p_h.get(i); e[i].px = px_h.get(i); e[i].v = v_h.get(i); e[i].vx = vx_h.get(i); e[i].w = w_h.get(i); e[i].wx = wx_h.get(i); } // // XPBD SOLVER // // ... at the end of the program I save the updated point attributes for (int i=0; i<entities; i++){ p_h.set( i, e[i].p); px_h.set(i, e[i].px); v_h.set( i, e[i].v); vx_h.set(i, e[i].vx); w_h.set( i, e[i].w); wx_h.set(i, e[i].wx); }
I store collisions pairs into bulks of sections that can be run in parallel. Each pair contains the index of two entities that I want to solve collisions for.
This is a quick summary of the code;
for (int ss = 0; ss<iterations; ++ss) { // update entity position for (int i=0; i<entities; i++){ // apply gravity etc. } // solve collisions int collisionIdx = 0; for (int i=0; i<sections; i++){ for (const auto& pair : sections[i]) { entity &e0 = e[pair.first]; entity &e1 = (pair.second == -1) ? e_ground : e[pair.second]; collisions_solve( e0, e1, collisions, collisionIdx ); collisionIdx += collisionsMax; } } // solve constraints (same setup as collisions but for constraints) // PBD velocity update for (int i=0; i<entities; i++){ // Update the linear velocity } // The velocity solver - we run this additional solver for every collision that we found collisionIdx = 0; for (int i=0; i<sections; i++){ for (const auto& pair : sections[i]) { entity &e0 = e[pair.first]; entity &e1 = (pair.second == -1) ? e_ground : e[pair.second]; velocity_solver( e0, e1, collisions, collisionIdx ); collisionIdx += collisionsMax; } } }
Here is a longer version of the collision solver, for the multithreading I store two list, e0 and e1 which includes the collision pairs that can be computed in parallel:
int collisionIdx = 0; for (int i = 0; i < sections.size(); ++i) { // check if we should use multithreading if( sections[i].size() < 100 ) { // for every collision pair for (const auto& pair : sections[i]) { entity &e0 = e[pair.first]; entity &e1 = (pair.second == -1) ? e_ground : e[pair.second]; collisions_solve( e0, e1, collisions, collisionsMax, collisionIdx, h); collisionIdx += collisionsMax; } } else { vector<int> e0; vector<int> e1; for (const auto& pair : sections[i]) { e0.push_back(pair.first); e1.push_back(pair.second); } threads.collision_solve( e, e0, e1, e_ground, collisions, collisionsMax, collisionIdx, h ); collisionIdx += collisionsMax * sections[i].size(); } }
At last here is the threaded collisions solve function. _e0 and _e1 holds the list of entities that we can solve in parallel:
class threadedObject { public: THREADED_METHOD8( // Construct two parameter threaded method threadedObject, // Name of class true, // Evaluated to see if we should multithread. collision_solve, // Name of function vector<entity> &, e, vector<int>, _e0, vector<int>, _e1, entity, e_ground, vector< Collision_Constraint > &, collisions, int, collisionsMax, int, collisionIdx, double, h ) void collision_solvePartial(vector<entity> &e, vector<int> _e0, vector<int> _e1, entity e_ground, vector< Collision_Constraint > &collisions, int collisionsMax, int collisionIdx, double h, const UT_JobInfo &info) { int i, n; for (info.divideWork( _e0.size() , i, n); i < n; i++) { int collision_idx = collisionIdx + (i*collisionsMax); entity &e0 = e[ _e0[i] ]; entity &e1 = (_e1[i] == -1) ? e_ground : e[ _e1[i] ]; collisions_solve( e0, e1, collisions, collisionsMax, collision_idx, h); } }; };
Should I do this in another way?
Instead of making the entities list I could fetch the values only when needed using them. This will eliminate the whole entities list and possible make it faster to ingest the code into a threaded function, and maybe make it more tread safe as Houdini's does all the attribute handling.
Here I have a custom "struct handles" that contains all the point attribute handles.
Example:
void collisions_solve( int e0, int e1, struct_handles &handles, vector< Collision_constraints > &collisions, int collision_idx ) { // fech attributes UT_Vector3 p0 = handles.p_handle.get(_e0[i]); UT_Vector3 p1 = handles.p_handle.get(_e1[i]); // do calculations // save collision // store attributes p_handle.set(_e0[i], p0); p_handle.set(_e1[i], p1); } for (info.divideWork( _e0.size() , i, n); i < n; i++) { // fech all point attributes int e0 = _e0[i]; int e1 = _e1[i]; // solve collision collisions_solve( e0, e1, handles, collisions, collision_idx ); }
This goes beyond my limited HDK C++ knowledge so any help or insight on the subject is highly appreciative!
Sorry for the lack of consistence use of brackets.
All credits goes to Felipe Kersting for the original codebase. I had to implement some custom constraints my self and of course do a lot of adjustments to make it work in houdini, but 90% of the code is copied from this repository:
https://github.com/felipeek/raw-physics [github.com]