(solved) Memory bloat while simming

   1138   1   0
User Avatar
Member
33 posts
Joined: Aug. 2014
Offline
(Aaand, worked it out. It was an apparently-long-standing memory leak loading VDB Point files from disk, fixed in the latest 18.5 production build. Hooray! I can return to a state of partial-sanity...)

--------

I'm starting to pull my hair out over crashes from memory consumption on sims that should have a stable memory footprint - it's an issue that seems to have cropped up for me time and again over the years, and I can never quite work out what's going on under the hood, nor what I could do to control it properly.

I'm running a FLIP sim that simulates a handful of frames easily on a machine with 64GB of ram, including a handful of frames after loading fresh from a checkpoint file at the heaviest point in the sim... yet running the full length sim on an hqueue client with 128GB causes a memory allocation fail after an hour or so (repeatedly, resubmitting to run from the latest checkpoint each time - I can get there eventually, but it requires me to babysit a sim through the farm, and means I can't run anything overnight).

The sim at the point it's crashing is actually getting smaller, as fluid leaves the container, particle counts go down, and field sizes reduce. Loading up from a checkpoint file, I can easily sim another couple of frames at that point on the 64GB machine, so it doesn't have an excessive memory footprint (It'll easily manage the first couple before crossing 32GB even).

As I'd understand it, the absolute most a FLIP sim should need cached is the current and previous frame. The solve is locked at 1 substep max, and I'm sourcing all colliders directly as volumes anyway (no static solver with collision generation happening), so presumably it shouldn't need any previous-frame cache at all really.

I've tried turning off caching on the FLIP object, turning it off entirely on the DOP network, precaching all emitters and collision volumes to disk, setting the "unload" flag on all SOP inputs, and SOP outputs saving the DOP I/O data to disk (avoiding setting unload on any SOP node that shares data with two different downstream nodes, to avoid cooking something twice). Literally everything I can think of to stop it caching anything at all longer than the time it needs to use the data.

Then I run it and watch the memory allocation... and without fail it creeps up another 2, 3, 4GB per frame, until eventually it fills the available RAM, and throws a segfault.

I know in theory, OSs' memory managment may often leave memory allocated to a process that isn't actively using it all... but then presumably the allocation wouldn't keep growing frame on frame without end.

This is Windows 10 local, and I believe Windows 8 on the farm, and running 18.0.416 - if any of those things could be the cause.

Does this sound like an OS/Houdini bug (one that might be resolved with later Houdini versions, or Linux?), or is there some way for me to explicitly tell Houdini to absolutely throw away anything that's more than two frames old and free up memory consistently?
Edited by VortexVFX - Aug. 17, 2021 19:44:35
Dan Wood
Vortex VFX Ltd
User Avatar
Member
33 posts
Joined: Aug. 2014
Offline
Well, found a workaround for now - periodic checkpoint relaunches as a way to clear memory.

I've set it up using the Hqueue Render ROP instead of the Simulation ROP, set the frame batch to be the same number of frames as the DOP network checkpoint file interval (so each batch ends with saving a sim state file, ready for the next one to load), and then manually assign a single Hqueue client machine to the job, so that I don't end up with extra clients trying to jump ahead and run a batch before there's a checkpoint file to load.

Seems to be working. I'd much rather just know how to keep the memory from overflowing though :-)
Edited by VortexVFX - Aug. 17, 2021 14:07:10
Dan Wood
Vortex VFX Ltd
  • Quick Links