FLIP OpenCL No Cuda usage?..

Forums Technical Discussion FLIP OpenCL No Cuda usage?..

986 11 1


dany27227: Member; 22 posts; Joined: Sept. 2016; Offline

Jan. 24, 2024 11:26 a.m.

Noticed that it's been a couple weeks where FLIP sims don't show cuda usage in the task manager. Tried lowering resolution and creating a new scene with default settings but no change. Latest production build on windows on an 2060 super. Create a wave tank and activate opencl in the solver for a quick test.

Noticing a bit of 3D gpu usage but I think that's just the viewport updating the blue points.

Edited by dany27227 - Jan. 24, 2024 11:26:48


Enivob: Member; 2537 posts; Joined: June 2008; Offline

Jan. 24, 2024 1:22 p.m.

Try clicking the down chevron next to the word "Copy1" on your task manager. Choose Cuda from the list. Often usage shows up in that graph.

Using Houdini Indie 20.0
Windows 11 64GB Ryzen 16 core.
nVidia 3050RTX 8BG RAM.


dany27227: Member; 22 posts; Joined: Sept. 2016; Offline

Jan. 24, 2024 3:34 p.m.

That's not it. Checking the right graph.


TheNotepadShow: Member; 30 posts; Joined: Sept. 2009; Offline

Jan. 24, 2024 6:04 p.m.

I am seeing the same results as the OP... I am on 20.0.590 Py3.10. I am using HWInfo, and not seeing any activity on the GPU.. nothing.

Thanks,
Todd Manus
www.artstation.com/thenotepadshow
www.itodd.net


Enivob: Member; 2537 posts; Joined: June 2008; Offline

Jan. 25, 2024 5:10 p.m.

I see about 8% to 16% CUDA usage. This is after I have removed all collision and particle display from the DOPNet guides. I'm looking at a blank viewport while I simulate. If you dive inside the solver, you may discover there are over 20 nodes in the network with OpenCL checkboxes that are off. I manually linked them to the top-level solver checkbox using a relative expression.

Overall, OpenCL will run out of vRam at some point. It's best not to depend upon it during unattended caching. It does offer a small speed boost while developing the simulation.

Edited by Enivob - Jan. 25, 2024 17:15:48

Using Houdini Indie 20.0
Windows 11 64GB Ryzen 16 core.
nVidia 3050RTX 8BG RAM.


GnomeToys: Member; 9 posts; Joined: Jan. 2017; Offline

Jan. 25, 2024 10:23 p.m.

From past experience optimizing it, and without going into how things work because I feel like I could write a textbook on the whole mess, you'll never get more than ~40-50% of your GPU's theoretical performance unless the stars aligned and whoever wrote the script both had tons of detailed knowledge of compiler engineering that would allow them to write C code that produces what the GPU will want (because the optimizers for GPU backends have trouble on a language like C that doesn't have many ways of expressing the concepts involved in parallel code). That'll probably get you up to 70%. You'll never get to 100% without manually specifying your exact GPU's optimal kernel parameters. They're slowly fixing that situation but when I was mainly doing this on a Vega64 Liquid I could take any given opencl program laying around the internet as long as the data set was large enough, change the workgroup count / size / etc, go through and remove all short-circuiting / compound comparisons from conditional control flow statements where possible to a variable right before and just check for that being true or false instead, and GPU utilization would skyrocket to the point where transient voltages tripped the power supply I had at the time and the computer shut down if I tried to do anything else while it was running.

I don't even have a Compute or CUDA graph for the 4090 in task manager funnily enough, although the 7900xtx exposes 3 compute graphs there. MSI Afterburner's graphs expose Compute Engine 0..13 items for it, although I have no idea what they all are.

My suggestion would be nvidia-smi, comparing Pwr:Usage/Cap to the GPU utilization % there. I've seen situations on both cards where the utilization is very high and the power usage is very low which usually indicates it's stuck on something (like executing accidentally inserted FP64 code at the rate of 256 instructions per cycle for the full GPU in the case of the 4090) or on the 7900xtx when somebody's Vulkan codegen screwed up and most of the GPU is running wait instructions doing nothing waiting for the insanely large matrix multiplication they only assigned to 8/48 CUs to finish while the GPU has responded by upclocking itself too much (I know you told me the maximum boost speed should be 2.9GHz, but I'm still at least 300W under my power limit so I'm gonna run at 3.3GHz and make the entire system stutter instead)...

Anyway, what were the numbers prior to a couple of weeks ago? The newest Studio driver was just yesterday and the one before that was back from early december I think, but I swear they dropped support for a couple of GPU series in there somewhere. I doubt NVidia would be dropping anything from the 2000 series so soon but they may have accidentally included incompatible OpenCL (or left it out entirely if you're on the gaming driver, because practically nobody cares about it from that side of things). Try running clinfo from a command prompt / shell and see what it spits out. Usually if something is wrong with the install it either errors or spits out the CPU's OpenCL info.


GnomeToys: Member; 9 posts; Joined: Jan. 2017; Offline

Jan. 25, 2024 10:56 p.m.

FWIW I just ran the "melt object" solver with OpenCL turned on on a torus with the number of faces raised up a bit and viscosity enabled on the 4090 after putting in a collision plane. Compute activity showed up under 3D @ 60-75% activity through the whole simulation then dropped to 2% as expected when it was cached and just replaying on a loop. I'm on Windows 10 latest production build like the OP, Py 3.10


tamte: Member; 8554 posts; Joined: July 2007; Offline

Jan. 26, 2024 1:20 a.m.

Enivob
If you dive inside the solver, you may discover there are over 20 nodes in the network with OpenCL checkboxes that are off.

just because a generic DOP microsolver can run using OpenCL doesn't mean it will speed up the particular solver using such node
it can be beneficial mostly if you are running most of the subsequent operations on OpenCL device and the data can stay in GPU memory, since otherwise if every other microsolver runs on GPU then Houdini would need to constantly copy data between CPU and GPU and therefore it can not only unnecessarily introduce GPU memory limitations but also even slow down the whole solve
So that's why I assume a lot of Microsolvers have OpenCL off on purpose

however what does use OpenCl as per from FLIP docs:

Use OpenCL checkbox tooltip
Solve the linear systems for viscosity and pressure using OpenCL. This setting is mostly beneficial for high resolution fluid simulations with viscosity, when run on a fast GPU.

so if you are simming heavy viscous FLIP sim and have OpenCl on FLIP on, then I'd expect to see some GPU useage

Edited by tamte - Jan. 26, 2024 01:21:36

Tomas Slancik
FX Supervisor
Method Studios, NY


dany27227: Member; 22 posts; Joined: Sept. 2016; Offline

Jan. 26, 2024 7:29 a.m.

I verified with Vellum that the problem isn't with OpenCL, the GPU or drivers (updated to latest).

You're talking about going and manually linking OpenCL checkboxes in the solver?..

You shouldn't have to do that.

I'm saying that for some reason the default wave tank is not using cuda by simply activating the master OpenCL checkbox in the solver as it always has. If it isn't for you either you have the same problem...

It's not a vram problem as I said before. I tried lowering the resolution.


AlexandreSV: Staff; 64 posts; Joined: June 2023; Offline

Jan. 26, 2024 10:50 a.m.

You don't need to unlock the solver to get your GPU going.

As Thomas said, OpenCL is only used for viscosity and pressure projection. One thing that is not obvious at the moment, is that OpenCL is only used for pressure projection if adaptivity is off.

By default, the shelf tools use adaptive pressure projection, which means you have to uncheck "Solve Pressure with Adaptivity" otherwise OpenCL is only used for viscosity. I will add more info to the tooltip so this is properly documented and hopefully more user-friendly

Let us know if this solves your issue.


dany27227: Member; 22 posts; Joined: Sept. 2016; Offline

Jan. 26, 2024 11:16 a.m.

Ahhh! Yup that did it.

Is adaptivity simply an optimization here like with adaptive sampling in rendering? If so could it be faster with no adaptivity on a fast enough GPU?

Thanks!


AlexandreSV: Staff; 64 posts; Joined: June 2023; Offline

Jan. 26, 2024 1:13 p.m.

It's a different algorithm using an octree to solve the pressure projection instead of a regular grid. You can get more information about this DOP here:

https://www.sidefx.com/docs/houdini/nodes/dop/gasprojectnondivergentadaptive.html [www.sidefx.com]

Currently this algorithm is not implemented in OpenCL and, therefore, cannot run on the GPU. Depending on the simulation and hardware, a powerful GPU running on a regular grid can outperform an adaptive pressure solve on the CPU.

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts