OpenCL Performance on different GPU's

Forums Technical Discussion OpenCL Performance on different GPU's

974 2 1


Mikal: Member; 15 posts; Joined: Sept. 2009; Offline

Aug. 17, 2018 6:43 a.m.

Hi SESI/Anyone

A question regarding OpenCL performance on different GPU's. Apologies if this has been answered in another thread.

We've got a shiny new Nvidia GV100 in the office and I was keen to test the increase in performance we might get over our workstation GeForce 1080 cards.

I tested a simple pyro explosion scene:- tiny SOP overhead, turned off caching, ran simulation in the background, no background processes running, Latest NVidia drivers. Houdini 16.5.439. I was surprised to find, after multiple tests.. not much difference at all.

100 frames (Peak 30 MV)
GV100 - 1 min 26 seconds
Geforce 1080 - 1 min 30 seconds

CPU benchmark as well, why not
Dual Intel E5-2678W v4 - 8 mins 12 seconds (peak 32MV)

Admittedly not the longest sim, but with double the number of CUDA cores I would have expected a more significant improvement from the GV100 over the 1080.

Not knowing much about OpenCL implementation, and using the reasoning that new expensive shiny card should process quicker than dusty older card. I was wondering if there's any internal limit to OpenCL performance - perhaps it never scales to use all the cores available thus never taking advantage of the extra ones on the GV100? Or could it be the HOUDINI_OCL_MEMORY_POOL_SIZE or other environment variable that might restrict performance? Anything else I might be missing?

Thanks for any insight.

Cheers


malexander: Staff; 5158 posts; Joined: July 2005; Offline

Aug. 17, 2018 8:33 a.m.

How big were the pyro volumes? In order to fully take advantage of massive parallism, the job needs to be diced into pieces to feed all the compute units. If the volume isn't large enough, some of those units will be starved, amd the overall speedup lessened. There is also some cpu overhead for the non-CL work which will become a greater portion of the total cook time as the CL time decreases. Eventually it'll settle to around the speed of the non-CL time as the CL compute becomes near-instant.

Edited by malexander - Aug. 17, 2018 08:34:10


Mikal: Member; 15 posts; Joined: Sept. 2009; Offline

Aug. 20, 2018 4:58 a.m.

Hey

Thanks for your reply.

When you mean “how big”, I assume you mean the size in megavoxels. The peak is about 30MV - although this does occur toward the end of the sim. Based on your reply then, it sounds like only the latter frames of the simulation have enough data to take advantage of the additional cores?

Cheers

Edited by Mikal - Aug. 20, 2018 04:59:22

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts