To test the Pyro solver, I used the default box on which I applied the shelf explosion preset.
First, I ran the sim with OpenCL disabled, then I stopped it, rewound the animation and enabled OCL. In the background, I kept my eyes on htop to monitor the CPU threads, and nvidia-smi to monitor the GPU load.
So the first thing that stroke me was the heavy use of GPU in a non-OpenCL sim. At one point the load reached 61%. The CPU load was, I don't know, around 75%. Hard to tell.
Then, with OpenCL enabled, CPU load decreased to something below 50% of its overall processing power, while the GPU peaked at 24% max.
Image Not Found
(See: pyrofx-test.mp4 attachment)
The simulation was somewhat faster, but not by much. Also, the processing power of CPU and GPU was barely utilized. Why?
In general there wasn't much of an improvement over my 8-years old GTX 660 Ti.
After giving Pyro a test, I proceeded to Vellum. A colleague of mine has shown me this Twitter thread (https://twitter.com/adamyassienali/status/1350875542174789632), where someone tried to prove that his C4D solution to soft bodies is much superior in terms of speed than Houdini's Vellum and FEM. I downloaded the boot geo the author has posted, prepared a Vellum SOP sim and eventually I came up with this:
Image Not Found
(See: vellum-boot-test.mp4 attachment)
This is based on Vellum struts and proxy geo BTW. I spend several hours tweaking parameters of vellum config nodes as well as vellumbrush solver settings (which I tried to keep at minimal). But that's not the point. The point is that CPU load during sim barely reached 50% per thread (topped occasionally by maybe one thread being utilized at 99%) and GPU was also barely touched, with the load never exceeding 20% (stats are not visible on the video, but I don't have time to re-record the video now). And well, the performance is barely better than in Matt Estela's scene in which he used a GTX 1070 card. Again: why?
Could someone please explain to me the reason behind neither the CPU nor the GPU being fully utilized in both simulations? What do I have to do in order to take full advantage of the processing power of my workstation?