Effective opencl

   2510   6   2
User Avatar
Member
135 posts
Joined: March 2018
Offline
I am trying to convert some vex code to opencl in order to speed it up significantly. The code I play around with right now is quite simple and only write to a 1000x1000x1 float volume (generating an image!). Converting the code was easy enough but I get very slow results, only 3-4 times as fast to use a RTX 2080ti with opencl than a 6 core intel 5820 on vex.
Using global mem in OpenCL is not adviced and so I would like to move to use texture memory instead but don't know how to interface with Houdini then. Are there any info, or examples of efficient opencl code running in a OpenCL sop?

For reference, running the same (more or less) code as an GLSL shader is about 100 times faster on the same graphics card.
User Avatar
Member
897 posts
Joined: July 2018
Offline
There's an overhead cost to opencl in copying the data between cpu and gpu so you typically need code that iterates a large number of times over the same data before you make this up.
B.Henriksson, DICE
User Avatar
Member
135 posts
Joined: March 2018
Offline
After some further tests using both points and volumes as containers for data I would guess I get an approximate of 10x performance gain on my computer. Seems like better investment to get a faster CPU and stay with VEX as it is now. The gains are better then nothing of course, but nowhere near what the potential for simple things like what I am doing here. Again, my reference is the same code (translated) running as a fragment shader.

Maybe there could be some image and volume specific bindings for the OpenCL SOP that uses texture memeory as an option?
User Avatar
Member
897 posts
Joined: July 2018
Offline
But that's what you got already? But the bottleneck isn't compute but I/O. I've got crazy speeds with OCL when problem was isolated to one compute step: http://beautyapproximations.blogspot.com/2018/01/cloth-demo-gif.html?m=1 [beautyapproximations.blogspot.com]
B.Henriksson, DICE
User Avatar
Member
135 posts
Joined: March 2018
Offline
@kahuna031, In this test case I just have a generator that is dependent on the world position so my kernel is basically:

...
kernel void kernelName(
    int P_length, 
    global float * P,
    global float * density)
{
    int idx = get_global_id(0);
    float3 p = vload3(idx, P);
    density[idx] = massiveArithmetics(p);
}

I feel like I am missing something here but maybe I just cannot expect GLSL level of performance ?
User Avatar
Member
897 posts
Joined: July 2018
Offline
It is GLSL performance but VEX is already quite fast and unless massiveArithmetics is very slow on vex then your bottleneck will be the cpu-gpu-cpu copying.

Try comment out massiveArithmetics see what compute time you get, that's basically the overhead, If this number is close to vex then opencl offers nothing.
B.Henriksson, DICE
User Avatar
Member
135 posts
Joined: March 2018
Offline
EDIT: ok the biggest error I made in my first test was sloppyness with leaving out trailing f:s on the literals wich I assume resulted in a lot of coercions

There is no doubt that the calc is what takes time. Anyway. I made a test case and you can check this .hip if you want to. The crazy thing is that this behaves more in line with what one would expect…



It is based on this ShaderToy code(https://www.shadertoy.com/view/ldf3DN) that calculates a Mandelbrot with orbit traps:
I converted the code to VEX and OpenCL. In both cases I had to add some functionality that was missing. But all in all it is very similar. I made sure everythig used floats and no conversions from doubles in the CL code. I might still have missed stuff but I think it is quite fair.
Changing the Shadertoy code to not uses antialiasing and setting the iterations to 100000 and runnig the window at 1200x675 gives me in the range 20-30 fps. Thur rendering 60 frames takes 2-3 secs
The same settings in houdini both for vex and opencl and running 60 frames with perf monitor on report that my RTX 2080 ti running OpenCL is about 12x faster doing the calc than my 5820 6 core intel CPU.
so:
GLSL on shadertoy: 2- 3 s
VEX: 31 s
OpenCL: 2.7s

Hmm, maybe it if fine after all
(Making the calc domain larger/higher res made the benefit of OpenCL larger just as expected.)

Still don't get why my earlier example with very similar setup/principle had such a difference between OpenCL and GLSL.
Edited by filipw - Dec. 20, 2019 14:03:11

Attachments:
opencl_vs_vex_001.hiplc (209.0 KB)

  • Quick Links