One of the exciting features in Houdini 12 is Open CL acceleration of fluids. However, like most things involving the GPU, there are some caveats.
You need have a relatively recent card (Fermi and later) and good drivers. You need to be on either Linux or Windows. We removed the Mac support as we had difficulty working with the drivers found in most installs.
You should start with a simple scene. Attached is a good starting point for experimentation.
The attached scene defaults at a very low res, 64^3, to ensure it runs. After you verify it actually runs in OpenCL mode, you can try boosting the resolution. To really take advantage of the GPU you need more than 64^3 voxels - the big differences show up at large resolutions like 256^3.
Fastest GPU speeds are achieved by avoiding any copying to and from the CPU memory. Well, you can't have zero copying, or you have nothing useful, but you want to minimize the transfer. This is why the attached file:
1) Turns off DOPs caching. Caching requires copying from GPU to CPU *all* the fields every frame. Very useful if you want to scrub and inspect random fields, not so useful if you want maximum speed.
2) Only imports density to SOPs. Only one field needs to be pulled from the GPU to CPU each frame.
3) Saves to disk in background. This gives you the best throughput. Displaying in the viewport requires a GPU -> CPU -> GPU round trip. (Yes, ugly, but likely required in general to support simming on a card other than your display card)
4) Uses a plain smokesolver. All the code paths being used after the first frame sourcing are GPU enabled. If you add a microsolver that isn't GPU enabled, there is no error. Instead Houdini just silently does the GPU->CPU transfer for you.
With that in mind, it should be clear why the default Pyro effects don't show an improvement when OpenCL is first toggled:
1) They tend to be very low resolution for fast initial playback. There's not enough voxels for the GPU acceleration to greatly exceed the overhead.
2) They have a lot of non-GPU shaping tools. While a lot of tools are GPU enabled (such as vortex confinement) quite a few of the Pyro ones are based in VOPs which don't have a GPU version.
3) Caching is enabled by default in DOPs.
4) Resizing is enabled by default. Resizing has to go through the CPU to manage the field changes. It can also fragment the GPU memory resulting in out-of-memory errors.
While we regret that the “Use OpenCL” toggle isn't at turn key “Make things blazingly fast” toggle, we do stand by the significant improvements you can realize if you optimize your scene around the GPU.
Open CL Settings
64192 43 9- jlait
- スタッフ
- 6378 posts
- Joined: 7月 2005
- Offline
- varomix
- Member
- 460 posts
- Joined: 7月 2005
- Offline
- gwader
- Member
- 40 posts
- Joined: 9月 2006
- Offline
- Alejandro Echeverry
- Member
- 691 posts
- Joined: 6月 2006
- Offline
Thank you very much Jeff!!!!!
It's very very fast!!!!
dual Xeon quad 2,4 ghz, 10 gb ram, first GPU nvidia geforce 8800 gt, divice for OpenCL nvidia geforce 550 ti 2gb.
Driver 295.20
0.85 sec / frame 256^3
It's very very fast!!!!
dual Xeon quad 2,4 ghz, 10 gb ram, first GPU nvidia geforce 8800 gt, divice for OpenCL nvidia geforce 550 ti 2gb.
Driver 295.20
0.85 sec / frame 256^3
Edited by - 2012年3月6日 14:40:17
Feel The Knowledge, Kiss The Goat!!!
http://www.linkedin.com/in/alejandroecheverry [linkedin.com]
http://vimeo.com/lordpazuzu/videos [vimeo.com]
http://www.linkedin.com/in/alejandroecheverry [linkedin.com]
http://vimeo.com/lordpazuzu/videos [vimeo.com]
- mrsunshine001
- Member
- 1 posts
- Joined: 11月 2011
- Offline
- jlait
- スタッフ
- 6378 posts
- Joined: 7月 2005
- Offline
Times measured by setting resolution to 256^3 and then jumping from frame 1 to frame 10 with viewport open. I read the times off the Frame Time from the display options.
Linux, Ubuntu.
260.19.44 drivers, Tesla C2070 card
3.8 sec/10 frames
Single i7 x980 (hex core, hyperthreaded, 3.3 Ghz)
40.7sec/10 frames
This is where the 10x faster for the GPU vs CPU comes from. Of course, if you pick a weaker video card, or a beefier CPU, you can change this ratio. But at the time we were testing it seemed a pretty fair comparison. The difference also decreases if you add overhead for saving, displaying, or non GPU microsolvers - Amdahl's law says enough overhead always brings stuff back to parity.
Linux, Ubuntu.
260.19.44 drivers, Tesla C2070 card
3.8 sec/10 frames
Single i7 x980 (hex core, hyperthreaded, 3.3 Ghz)
40.7sec/10 frames
This is where the 10x faster for the GPU vs CPU comes from. Of course, if you pick a weaker video card, or a beefier CPU, you can change this ratio. But at the time we were testing it seemed a pretty fair comparison. The difference also decreases if you add overhead for saving, displaying, or non GPU microsolvers - Amdahl's law says enough overhead always brings stuff back to parity.
- moondeer
- Member
- 14 posts
- Joined: 9月 2011
- Offline
- Mudvin
- Member
- 66 posts
- Joined: 7月 2005
- Offline
- circusmonkey
- Member
- 2624 posts
- Joined: 8月 2006
- Offline
Platform: windows-x86_64-cl15
OpenGL Vendor: NVIDIA Corporation
OpenGL Renderer: Quadro 4000/PCI/SSE2
OpenGL Version: 4.1.0
OpenGL Shading Language: 4.10 NVIDIA via Cg compiler
Detected: NVidia Professional
2048 MB
276.52.0.0
6.74 sec / 10 frames @ 256^3
Using 2 x Xeon 2.4 6 core e5645 and 48gigs of ram
rob
OpenGL Vendor: NVIDIA Corporation
OpenGL Renderer: Quadro 4000/PCI/SSE2
OpenGL Version: 4.1.0
OpenGL Shading Language: 4.10 NVIDIA via Cg compiler
Detected: NVidia Professional
2048 MB
276.52.0.0
6.74 sec / 10 frames @ 256^3
Using 2 x Xeon 2.4 6 core e5645 and 48gigs of ram
rob
Gone fishing
- Sugleris
- Member
- 30 posts
- Joined: 9月 2011
- Offline
On Windows 7 x64 with 32 gigs of ram.
Testing OpenCL with a 12 core Mac pro with a Quadro 4000 with 2 gigs of VRAM and I'm seeing absolutely no difference in speed. I opened up your test scene and they both run the same. Maybe its because there are 24 CPU cores, and it starts to even out, but I figured a quadro 4000 with 2 gigs should be able to rock the fluid sim and have a realtime fluid running with a 64 resolution fluid.
OpenCL works great with Turbulence FD, It does great GPU sims, I just don't prefer Lightwave or Cinema.
Was looking forward to Houdini 12, I'd love to flex my HD license with more with Fluids. I hate jumping between multiple 3D packages, but Turbulence's OpenCL performance works easy out the box and it's a clear difference in simtime when its set to GPU calculation.
Gonna have to jump out of Houdini for Fluid sims temporarily, I was blown away by Turbulence FD's speed on CL.
Testing OpenCL with a 12 core Mac pro with a Quadro 4000 with 2 gigs of VRAM and I'm seeing absolutely no difference in speed. I opened up your test scene and they both run the same. Maybe its because there are 24 CPU cores, and it starts to even out, but I figured a quadro 4000 with 2 gigs should be able to rock the fluid sim and have a realtime fluid running with a 64 resolution fluid.
OpenCL works great with Turbulence FD, It does great GPU sims, I just don't prefer Lightwave or Cinema.
Was looking forward to Houdini 12, I'd love to flex my HD license with more with Fluids. I hate jumping between multiple 3D packages, but Turbulence's OpenCL performance works easy out the box and it's a clear difference in simtime when its set to GPU calculation.
Gonna have to jump out of Houdini for Fluid sims temporarily, I was blown away by Turbulence FD's speed on CL.
Shortcuts are supposed to be hard, if they were easy then they would just be “The Way”.
- jason_iversen
- Member
- 12612 posts
- Joined: 7月 2005
- Offline
He just said “We removed the Mac support as we had difficulty working with the drivers found in most installs.”
Jason Iversen, Technology Supervisor & FX Pipeline/R+D Lead @ Weta FX
also, http://www.odforce.net [www.odforce.net]
also, http://www.odforce.net [www.odforce.net]
- Sugleris
- Member
- 30 posts
- Joined: 9月 2011
- Offline
Read the entire post. its running in Win7, and I also have a PC formatted GTX 560 with 2 gigs of ram in it and I still see no diff.
No mac support means no support OS X, When I'm in windows I might as well be a PC.
Also ran H12 apprentice on my Asus Sabertooth i7 build and I'm not seeing a difference here either with a 9800 GTX, albeit its a 512 mb card, it should still handle a 64 x 64 res sim in realtime.
No mac support means no support OS X, When I'm in windows I might as well be a PC.
Also ran H12 apprentice on my Asus Sabertooth i7 build and I'm not seeing a difference here either with a 9800 GTX, albeit its a 512 mb card, it should still handle a 64 x 64 res sim in realtime.
Shortcuts are supposed to be hard, if they were easy then they would just be “The Way”.
- johner
- スタッフ
- 822 posts
- Joined: 7月 2006
- Offline
- jlait
- スタッフ
- 6378 posts
- Joined: 7月 2005
- Offline
Thank you for giving it a try, Sugleris.
Unfortunately, I am not surprised that you don't see a difference at 64^3. Our focus in Houdini 12 was big sims rather than real time performance. As such, we still have a lot of fixed overhead in both the CPU and OpenCL version regardless of the voxel count. This prevents things from getting any faster than a base level.
To get really fast gpu fluid support you really want to render the fluid using the same texture it is simulated with. For infrastructural reasons we can't do this - everything has to round-trip through the CPU.
I just turned on the Performance Monitor and ran the file at 64^3 for 1.8 seconds (after the initial few frames were done). I got 6.1FPS in the viewport monitor (Note the performance monitor itself caused a big fps drop)
The break down:
Computation: 1.087s
- Dops: 0.610s
- Solve: 0.346s
- Configure: 0.217s
- Sops: 0.351s
Viewports: 0.755s
The solve time is the only part affected by CPU vs OpenCL. For big sims, this dominates. But for small sims, it doesn't so we have this unfortunate levelling of the playing field. With this particular setup, if the GPU were infinitely fast, I'd only get 7.5 FPS.
With the performance monitor off, looking at the dop sim directly (with display of the object off) I get 10 FPS.
If I could hold your attention for a bit longer, I'd be curious if you see a difference when setting the max divisions to 256.
Unfortunately, I am not surprised that you don't see a difference at 64^3. Our focus in Houdini 12 was big sims rather than real time performance. As such, we still have a lot of fixed overhead in both the CPU and OpenCL version regardless of the voxel count. This prevents things from getting any faster than a base level.
To get really fast gpu fluid support you really want to render the fluid using the same texture it is simulated with. For infrastructural reasons we can't do this - everything has to round-trip through the CPU.
I just turned on the Performance Monitor and ran the file at 64^3 for 1.8 seconds (after the initial few frames were done). I got 6.1FPS in the viewport monitor (Note the performance monitor itself caused a big fps drop)
The break down:
Computation: 1.087s
- Dops: 0.610s
- Solve: 0.346s
- Configure: 0.217s
- Sops: 0.351s
Viewports: 0.755s
The solve time is the only part affected by CPU vs OpenCL. For big sims, this dominates. But for small sims, it doesn't so we have this unfortunate levelling of the playing field. With this particular setup, if the GPU were infinitely fast, I'd only get 7.5 FPS.
With the performance monitor off, looking at the dop sim directly (with display of the object off) I get 10 FPS.
If I could hold your attention for a bit longer, I'd be curious if you see a difference when setting the max divisions to 256.
- Sugleris
- Member
- 30 posts
- Joined: 9月 2011
- Offline
Ahh, I see, That makes sense, having to pass through the CPU regardless. I was getting confused due to using a GPU fluid solver before that was doing 60fps 64x64 sims, but with what you said it makes sense.
I'm hitting myself in the head here on 256 it went from
7.7 frames per min on 24 cores to
52 frames per minute on OpenCL.
Do you guys have any plans to introduce realtime playback on just density for R&D purposes?
either way, This is great once the res is higher.
I'm hitting myself in the head here on 256 it went from
7.7 frames per min on 24 cores to
52 frames per minute on OpenCL.
Do you guys have any plans to introduce realtime playback on just density for R&D purposes?
either way, This is great once the res is higher.
Shortcuts are supposed to be hard, if they were easy then they would just be “The Way”.
- mapache69
- Member
- 12 posts
- Joined: 12月 2007
- Offline
Did something change when H12 went gold? All openCl pyro/smoke solvers crash after 30 or 32 frames, regardless of platform/complexity , ect. This happens at home on win64 , and at Sony on linux. This was working ok in .501 and before, but something is different. Any thoughts?
THanks Jeff,
Theo :shock:
THanks Jeff,
Theo :shock:
- malexander
- スタッフ
- 5199 posts
- Joined: 7月 2005
- Offline
- bhaveshpandey
- Member
- 127 posts
- Joined: 11月 2008
- Offline
the file crashes on me at 200^3 with the following error msg:
OpenCL Exception: clFinish (-36)OpenCL Context error: CL_OUT_OF_RESOURCES error waiting for idle on Quadro 4000 (Device 0).
OpenCL Context error: CL_OUT_OF_RESOURCES error waiting for idle on Quadro 4000 (Device 0).
has anyone else encountered anything similar?
system specs:
Platform: linux-x86_64-gcc4.4
Number of Cores: 24
Physical Memory: 47.26 GB
OpenGL Vendor: NVIDIA Corporation
OpenGL Renderer: Quadro 4000/PCI/SSE2
OpenGL Version: 4.1.0 NVIDIA 260.19.36
OpenGL Shading Language: 4.10 NVIDIA via Cg compiler
Detected: NVidia Professional
2048 MB
260.19.36.0
Is this related to the drivers (can anyone confirm??)?
cheers.
OpenCL Exception: clFinish (-36)OpenCL Context error: CL_OUT_OF_RESOURCES error waiting for idle on Quadro 4000 (Device 0).
OpenCL Context error: CL_OUT_OF_RESOURCES error waiting for idle on Quadro 4000 (Device 0).
has anyone else encountered anything similar?
system specs:
Platform: linux-x86_64-gcc4.4
Number of Cores: 24
Physical Memory: 47.26 GB
OpenGL Vendor: NVIDIA Corporation
OpenGL Renderer: Quadro 4000/PCI/SSE2
OpenGL Version: 4.1.0 NVIDIA 260.19.36
OpenGL Shading Language: 4.10 NVIDIA via Cg compiler
Detected: NVidia Professional
2048 MB
260.19.36.0
Is this related to the drivers (can anyone confirm??)?
cheers.
- kjmitch
- Member
- 23 posts
- Joined: 12月 2011
- Offline
- sueshijuu
- Member
- 5 posts
- Joined: 3月 2012
- Offline
-
- Quick Links