Open CL Settings

   52020   42   9
User Avatar
Staff
4627 posts
Joined: July 2005
Offline
One of the exciting features in Houdini 12 is Open CL acceleration of fluids. However, like most things involving the GPU, there are some caveats.

You need have a relatively recent card (Fermi and later) and good drivers. You need to be on either Linux or Windows. We removed the Mac support as we had difficulty working with the drivers found in most installs.

You should start with a simple scene. Attached is a good starting point for experimentation.

The attached scene defaults at a very low res, 64^3, to ensure it runs. After you verify it actually runs in OpenCL mode, you can try boosting the resolution. To really take advantage of the GPU you need more than 64^3 voxels - the big differences show up at large resolutions like 256^3.

Fastest GPU speeds are achieved by avoiding any copying to and from the CPU memory. Well, you can't have zero copying, or you have nothing useful, but you want to minimize the transfer. This is why the attached file:
1) Turns off DOPs caching. Caching requires copying from GPU to CPU *all* the fields every frame. Very useful if you want to scrub and inspect random fields, not so useful if you want maximum speed.
2) Only imports density to SOPs. Only one field needs to be pulled from the GPU to CPU each frame.
3) Saves to disk in background. This gives you the best throughput. Displaying in the viewport requires a GPU -> CPU -> GPU round trip. (Yes, ugly, but likely required in general to support simming on a card other than your display card)
4) Uses a plain smokesolver. All the code paths being used after the first frame sourcing are GPU enabled. If you add a microsolver that isn't GPU enabled, there is no error. Instead Houdini just silently does the GPU->CPU transfer for you.

With that in mind, it should be clear why the default Pyro effects don't show an improvement when OpenCL is first toggled:
1) They tend to be very low resolution for fast initial playback. There's not enough voxels for the GPU acceleration to greatly exceed the overhead.
2) They have a lot of non-GPU shaping tools. While a lot of tools are GPU enabled (such as vortex confinement) quite a few of the Pyro ones are based in VOPs which don't have a GPU version.
3) Caching is enabled by default in DOPs.
4) Resizing is enabled by default. Resizing has to go through the CPU to manage the field changes. It can also fragment the GPU memory resulting in out-of-memory errors.

While we regret that the “Use OpenCL” toggle isn't at turn key “Make things blazingly fast” toggle, we do stand by the significant improvements you can realize if you optimize your scene around the GPU.

Attachments:
clexplanation.hip (655.5 KB)

User Avatar
Member
404 posts
Joined: July 2005
Offline
what a great post Jeff
this clears a few of my questions

thanks for the scene file will inspect it today

cheers!!
varomix FX TD | Founder | Educator
MIX Training || www.mix-training.com
User Avatar
Member
40 posts
Joined: Sept. 2006
Offline
Great info!
User Avatar
Member
658 posts
Joined: June 2006
Offline
Thank you very much Jeff!!!!!

It's very very fast!!!!

dual Xeon quad 2,4 ghz, 10 gb ram, first GPU nvidia geforce 8800 gt, divice for OpenCL nvidia geforce 550 ti 2gb.

Driver 295.20

0.85 sec / frame 256^3
Edited by - March 6, 2012 14:40:17
Feel The Knowledge, Kiss The Goat!!!
http://www.linkedin.com/in/alejandroecheverry [linkedin.com]
http://vimeo.com/lordpazuzu/videos [vimeo.com]
User Avatar
Member
1 posts
Joined: Nov. 2011
Offline
Would it be possible for users who are evaluating this test scene to list the hardware/driver settings they are using. It is hard to evaluate any claims without knowing these things.
User Avatar
Staff
4627 posts
Joined: July 2005
Offline
Times measured by setting resolution to 256^3 and then jumping from frame 1 to frame 10 with viewport open. I read the times off the Frame Time from the display options.

Linux, Ubuntu.
260.19.44 drivers, Tesla C2070 card
3.8 sec/10 frames

Single i7 x980 (hex core, hyperthreaded, 3.3 Ghz)
40.7sec/10 frames

This is where the 10x faster for the GPU vs CPU comes from. Of course, if you pick a weaker video card, or a beefier CPU, you can change this ratio. But at the time we were testing it seemed a pretty fair comparison. The difference also decreases if you add overhead for saving, displaying, or non GPU microsolvers - Amdahl's law says enough overhead always brings stuff back to parity.
User Avatar
Member
14 posts
Joined: Sept. 2011
Offline
jlait
…We removed the Mac support as we had difficulty working with the drivers found in most installs…

i would love to have tested my mac setup with opencl.
disabling mac opencl entirely is disappointing. a toggle, allowing us to try using opencl would be appreciated.
User Avatar
Member
66 posts
Joined: July 2005
Offline
I had some other openGL apps running in background, maybe that's the reason. This is GTX580

*update* deleted screen to not explode forum width.
Edited by - March 9, 2012 01:55:50
wbr, Mudvin
User Avatar
Member
2624 posts
Joined: Aug. 2006
Offline
Platform: windows-x86_64-cl15
OpenGL Vendor: NVIDIA Corporation
OpenGL Renderer: Quadro 4000/PCI/SSE2
OpenGL Version: 4.1.0
OpenGL Shading Language: 4.10 NVIDIA via Cg compiler
Detected: NVidia Professional
2048 MB
276.52.0.0

6.74 sec / 10 frames @ 256^3

Using 2 x Xeon 2.4 6 core e5645 and 48gigs of ram

rob
Gone fishing
User Avatar
Member
30 posts
Joined: Sept. 2011
Offline
On Windows 7 x64 with 32 gigs of ram.

Testing OpenCL with a 12 core Mac pro with a Quadro 4000 with 2 gigs of VRAM and I'm seeing absolutely no difference in speed. I opened up your test scene and they both run the same. Maybe its because there are 24 CPU cores, and it starts to even out, but I figured a quadro 4000 with 2 gigs should be able to rock the fluid sim and have a realtime fluid running with a 64 resolution fluid.

OpenCL works great with Turbulence FD, It does great GPU sims, I just don't prefer Lightwave or Cinema.

Was looking forward to Houdini 12, I'd love to flex my HD license with more with Fluids. I hate jumping between multiple 3D packages, but Turbulence's OpenCL performance works easy out the box and it's a clear difference in simtime when its set to GPU calculation.

Gonna have to jump out of Houdini for Fluid sims temporarily, I was blown away by Turbulence FD's speed on CL.
Shortcuts are supposed to be hard, if they were easy then they would just be “The Way”.
User Avatar
Member
10263 posts
Joined: July 2005
Offline
He just said “We removed the Mac support as we had difficulty working with the drivers found in most installs.”
jason iversen, pipeline technology supervisor @ weta digital
also, http://www.odforce.net [www.odforce.net]
User Avatar
Member
30 posts
Joined: Sept. 2011
Offline
Read the entire post. its running in Win7, and I also have a PC formatted GTX 560 with 2 gigs of ram in it and I still see no diff.

No mac support means no support OS X, When I'm in windows I might as well be a PC.


Also ran H12 apprentice on my Asus Sabertooth i7 build and I'm not seeing a difference here either with a 9800 GTX, albeit its a 512 mb card, it should still handle a 64 x 64 res sim in realtime.
Shortcuts are supposed to be hard, if they were easy then they would just be “The Way”.
User Avatar
Staff
684 posts
Joined: July 2006
Offline
There should be a significant speedup from 12-physical CPU cores to a Quadro 4000, but you probably need to increase the resolution higher than 64^3 before it's apparent.
Edited by - March 8, 2012 15:03:37
User Avatar
Staff
4627 posts
Joined: July 2005
Offline
Thank you for giving it a try, Sugleris.

Unfortunately, I am not surprised that you don't see a difference at 64^3. Our focus in Houdini 12 was big sims rather than real time performance. As such, we still have a lot of fixed overhead in both the CPU and OpenCL version regardless of the voxel count. This prevents things from getting any faster than a base level.

To get really fast gpu fluid support you really want to render the fluid using the same texture it is simulated with. For infrastructural reasons we can't do this - everything has to round-trip through the CPU.

I just turned on the Performance Monitor and ran the file at 64^3 for 1.8 seconds (after the initial few frames were done). I got 6.1FPS in the viewport monitor (Note the performance monitor itself caused a big fps drop)

The break down:
Computation: 1.087s
- Dops: 0.610s
- Solve: 0.346s
- Configure: 0.217s
- Sops: 0.351s
Viewports: 0.755s

The solve time is the only part affected by CPU vs OpenCL. For big sims, this dominates. But for small sims, it doesn't so we have this unfortunate levelling of the playing field. With this particular setup, if the GPU were infinitely fast, I'd only get 7.5 FPS.

With the performance monitor off, looking at the dop sim directly (with display of the object off) I get 10 FPS.

If I could hold your attention for a bit longer, I'd be curious if you see a difference when setting the max divisions to 256.
User Avatar
Member
30 posts
Joined: Sept. 2011
Offline
Ahh, I see, That makes sense, having to pass through the CPU regardless. I was getting confused due to using a GPU fluid solver before that was doing 60fps 64x64 sims, but with what you said it makes sense.

I'm hitting myself in the head here on 256 it went from

7.7 frames per min on 24 cores to

52 frames per minute on OpenCL.



Do you guys have any plans to introduce realtime playback on just density for R&D purposes?

either way, This is great once the res is higher.
Shortcuts are supposed to be hard, if they were easy then they would just be “The Way”.
User Avatar
Member
12 posts
Joined: Dec. 2007
Offline
Did something change when H12 went gold? All openCl pyro/smoke solvers crash after 30 or 32 frames, regardless of platform/complexity , ect. This happens at home on win64 , and at Sony on linux. This was working ok in .501 and before, but something is different. Any thoughts?


THanks Jeff,


Theo :shock:
User Avatar
Staff
4732 posts
Joined: July 2005
Offline
There was a 3D texture leak which was fixed yesterday for build 566. This could have impacted CL memory use.

The journal/version entry missed the cutoff by 59 seconds, but the fix is there.
User Avatar
Member
123 posts
Joined: Nov. 2008
Offline
the file crashes on me at 200^3 with the following error msg:

OpenCL Exception: clFinish (-36)OpenCL Context error: CL_OUT_OF_RESOURCES error waiting for idle on Quadro 4000 (Device 0).


OpenCL Context error: CL_OUT_OF_RESOURCES error waiting for idle on Quadro 4000 (Device 0).

has anyone else encountered anything similar?

system specs:
Platform: linux-x86_64-gcc4.4
Number of Cores: 24
Physical Memory: 47.26 GB
OpenGL Vendor: NVIDIA Corporation
OpenGL Renderer: Quadro 4000/PCI/SSE2
OpenGL Version: 4.1.0 NVIDIA 260.19.36
OpenGL Shading Language: 4.10 NVIDIA via Cg compiler
Detected: NVidia Professional
2048 MB
260.19.36.0

Is this related to the drivers (can anyone confirm??)?

cheers.
Bhavesh Pandey.

https://bhaveshpandey.github.io [bhaveshpandey.github.io]
User Avatar
Member
23 posts
Joined: Dec. 2011
Offline
Jeff,

Is there any future plan to implement OpenCL for Mac OSX or is it a lost cause? I was really looking forward to seeing increased speeds on my Mac Pro. I'm currently running it with a quadro 6000.
User Avatar
Member
5 posts
Joined: March 2012
Offline
@bhaveshpandey

I had the same error, although on Windows 7 x64 with a GeForce 460. It simply turned out I didn't have the latest drivers. Installing the latest drivers fixed the problem.
Hope this is of any help to you.
  • Quick Links