Why is CPU and GPU not fully utilized in OpenCL simulations?

   3843   3   2
User Avatar
Member
466 posts
Joined: 8月 2014
Offline
I installed an RTX 3070 card in my workstation and ran some simulation tests of solvers, which I thought would greatly benefit from the new GPU (via OpenCL): Vellum and Pyro (not sparse).

To test the Pyro solver, I used the default box on which I applied the shelf explosion preset.
First, I ran the sim with OpenCL disabled, then I stopped it, rewound the animation and enabled OCL. In the background, I kept my eyes on htop to monitor the CPU threads, and nvidia-smi to monitor the GPU load.

So the first thing that stroke me was the heavy use of GPU in a non-OpenCL sim. At one point the load reached 61%. The CPU load was, I don't know, around 75%. Hard to tell.

Then, with OpenCL enabled, CPU load decreased to something below 50% of its overall processing power, while the GPU peaked at 24% max.

Image Not Found

(See: pyrofx-test.mp4 attachment)

The simulation was somewhat faster, but not by much. Also, the processing power of CPU and GPU was barely utilized. Why?
In general there wasn't much of an improvement over my 8-years old GTX 660 Ti.

After giving Pyro a test, I proceeded to Vellum. A colleague of mine has shown me this Twitter thread (https://twitter.com/adamyassienali/status/1350875542174789632), where someone tried to prove that his C4D solution to soft bodies is much superior in terms of speed than Houdini's Vellum and FEM. I downloaded the boot geo the author has posted, prepared a Vellum SOP sim and eventually I came up with this:

Image Not Found

(See: vellum-boot-test.mp4 attachment)

This is based on Vellum struts and proxy geo BTW. I spend several hours tweaking parameters of vellum config nodes as well as vellumbrush solver settings (which I tried to keep at minimal). But that's not the point. The point is that CPU load during sim barely reached 50% per thread (topped occasionally by maybe one thread being utilized at 99%) and GPU was also barely touched, with the load never exceeding 20% (stats are not visible on the video, but I don't have time to re-record the video now). And well, the performance is barely better than in Matt Estela's scene in which he used a GTX 1070 card. Again: why?

Could someone please explain to me the reason behind neither the CPU nor the GPU being fully utilized in both simulations? What do I have to do in order to take full advantage of the processing power of my workstation?
Edited by ajz3d - 2021年1月29日 21:32:46

Attachments:
pyrofx-test.mp4 (2.5 MB)
vellum-boot-test.mp4 (2.7 MB)

User Avatar
Member
117 posts
Joined: 7月 2005
Offline
It could be as simple as the task being memory bandwidth limited, to the CPU and/or GPU. And for the GPUs in particular many tasks have components that are inherently scalar and hence can't take advantage of the parallelism of OpenCl. The more complicated the task the less likely it can be made to run efficiently on modern systems.
Edited by drew - 2021年1月29日 22:30:05
User Avatar
Member
466 posts
Joined: 8月 2014
Offline
Thank you for your reply, Drew.
I think you're right. I tried running multiple independent simulations in parallel and Houdini used even less CPU/GPU resources than with a single Pyro sim. Perhaps it's indeed the memory bandwidth which bottlenecks the dense Pyro sim.

Besides, I believe that initially I have chosen the wrong Pyro solver to test the card with! Today I ran some tests with Minimal OpenCL Solve enabled, which is supposed to run entirely on a GPU. The difference was night and day, with the GPU being almost constantly under maximum load throughout the whole simulation. Of course, I eventually ran out of VRAM after up-resing the sim a couple of times (in order to see how much I can squeeze out of this baby), so I guess those 8 Gigs on the card are still not much to work with.

Attachments:
pminsolve.mp4 (7.7 MB)

User Avatar
Member
17 posts
Joined: 10月 2017
Offline
Yeah, anything based on NanoDB seems really fast ( the new GPU Pyro Sim and something like the Axiom Solver stuff).

Combined with Redshift rendering and its pretty potent...
  • Quick Links