Does GPU with double precision helps with simulation?

   645   6   2
User Avatar
Member
79 posts
Joined: Feb. 2016
Offline
I noticed most consumer GPU barely hit 1 teraflop mark in terms of double precision. In houdini simulation, there are solvers that use double precision, such as flip solver(FP32 is also an option). Would such GPU accelerates computations with openCL? If a computer system not only has one GPU, but it also as an accelerator such as Intel Phi, or Tesla V,etc. Would that be utilized when openCL is turned on in Houdini?
User Avatar
Member
53 posts
Joined: Sept. 2015
Offline
+1 I also interested.
User Avatar
Member
4127 posts
Joined: June 2012
Offline
Interesting question, my take on it is the acceleration will only affect error reduction in the sim and the 64bit option appears to be only in the viscosity section of the Flip solver. When using 32bit it still uses a 64bit buffer to help stop the accumulation of errors. On a quick test there is a 5% speed difference using 64bit vs 32bit with high viscosity on a 1080ti/Linux. You can run a test today to see if the visual difference is worth that, less noise is all I've noticed. Note the further the sim is away from world-space 0 will introduce more noise too.

A Quadro card's advantage might be the apparently faster PICe channel back to the CPU, larger Vram and 10bit colour support.

Only one OpenCL device is support ‘at the moment’, which has been true since 2011/12…

Hope that helps.
User Avatar
Member
79 posts
Joined: Feb. 2016
Offline
very nice of you to put the flip solver into test fuos, if only I had a Quadro GV100 of that sort, I'd definitely also test out FEM solver which utilizes 64bit precision as well. The “only one openCL device is supported” was also an interesting insight, what if I have a 1080TI along with a NVIDIA T4(accelerator), how does it know which one to use as the openCL device? (I reckon it'll use whichever is occupying the first PCIE lane, cause it'll have a device ID of 0)
Edited by ouroboros1221 - Jan. 17, 2019 18:33:22
User Avatar
Member
4127 posts
Joined: June 2012
Offline
No worries, this area is a hobby of mine

FEM only uses a C++ path at the moment, so the only hope is that it takes advantage of AVX-512. H17 has initial support though the release notes lacks granularity. There is literature that suggests this instruction needs to properly implemented to be more than non-useful:
https://lemire.me/blog/2018/04/19/by-how-much-does-avx-512-slow-down-your-cpu-a-first-experiment/ [lemire.me]

OCL device choice is very easy now, decided via the Preferences/miscellaneous.

EDIT: the ChangeLog suggests it's only VEX support for AVX-512, so the FEM solver most likely doesn't use it.
Edited by goat - Jan. 17, 2019 19:04:34
User Avatar
Member
79 posts
Joined: Feb. 2016
Offline
I roughly read through the blog, it seems he was suggesting a system would actually be slowed down by using the AVX-512 instruction set, what's going on?
User Avatar
Member
4127 posts
Joined: June 2012
Offline
that would require a @jeffLait quality answer

My lay persons understanding is that you are computing 8 x 64bit vectors in one hit, with a multiple and add, so if your data can't fill those registers you are not efficiently using that cpu cycle. In essence my theory is it's a too large a buffer to fill with non-contiguous data… now over to someone to who knows actually what is going on.
  • Quick Links