DjangoBB LoFi version

Full Version: Karma Vram limitations - General Explanation

Root » Technical Discussion » Karma Vram limitations - General Explanation

jtk700cln

April 9, 2024 15:22:47

Hey all,
General question about vram limitations for karma for now and the foreseeable future. Seen some earlier threads indicating that there is not and never will be support for nvlink with karma xpu. Nvidia also does not support nvlink or sli on the 4090 series to begin with.
To me this means karma xpu is limited to 24 gb of vram on commercial gpus, and 48 gb of vram on professional quadros - per scene render layer, per frame. This is if you have a 4090 or an even more expensive quadro rtx.
I know 24 gb might seem like a lot, but I am currently hitting a 8 gb cap pretty quickly (2x 2070 super) when I am instancing speed trees in a scene with 4k textures. In fact the final render readout on a single layer with these trees is 50 gb of vram for a single frame- so even a 4090 wouldn’t be able to solve this.
Scene optimization, and layer setups, can get this number lower, but speaking generally, the inability to pool vram feels like a limitation for large scale vfx scenes. My cpu renderer for example, has the luxury of falling back on my 256 gb of ram- gpu no dice.
If you hit the vram cap too, the optix component will immediately disable for the entire scene. it doesn’t appear that karma has the ability to say distribute 24 gbs to vram (4090 hypothetical) and the rest to ram and continue rendering- not implying this is a simple solution mind you. I’m just pointing out that if you go over the gpu threshold, your back to cpu for the entire scene.

Karma rendering is more of a hobbyist thing for me at the moment- so take my thoughts with a grain of salt.

Is this a ubiquitous problem for all xpu and gpu renderers post nvlink now? Or are there render engines configured to leverage all the vram available- say I have 3x 4080s with 48 gbs of vram total- would redshift or vray also
limit itself to the vram on one card?
I’m about to make updates to my system (either 4090 or 2x 4070 super ti’s), so any insights on the future of xpu, it’s inherent vram limitations, and ways around this, would be appreciated, thanks!

brians

April 9, 2024 21:56:15

some random thoughts:

brians
In fact the final render readout on a single layer with these trees is 50 gb of vram for a single frame

sorry, how did you come up with this 50gb number? Is that something you got from XPU? Or some other package? XPU only loads the data it needs onto GPU (demand-loading), so it might make use of less GPU ram than you initially think.

XPU will go out-of-core for textures, although the GPU will most likely run very slowly in this state.

On Windows, GPU memory will spill over to CPU memory (called "Shared memory") and seems to be still rather performant in this case. Please note this will not work on Linux, or if you have 2 GPUs.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]

jtk700cln

April 10, 2024 01:33:31

Helpful Brian thanks. The readout on memory was coming from the render gallery.
I set it up using this here-

https://m.youtube.com/watch?v=q7kQh53v6Fs. [m.youtube.com]

Let me double check myself on the exact numbers though and get you a screenshot tomorrow. I have to make sure that the scene memory consumption is actually the same read as the vram consumption.
Give me a beat!

jtk700cln

April 10, 2024 01:41:12

Brian if you don’t mind me asking a second related question. In a two gpu system with one powerful gpu (4090) and one weaker gpu (4060) Is it possible to configure Houdini so that it is using the 4090 exclusively for karma render tasks, but the 4060 for everything else (viewport navigation, etc)?
Alternatively I could force Houdini to only use the 4090 in the environment file, but then I presume some of the vram from the 4090 would still be getting consumed by non karma related tasks…..just trying to think of ways to be as optimized as possible!

jtk700cln

April 12, 2024 22:15:21

brians
some random thoughts:

brians
In fact the final render readout on a single layer with these trees is 50 gb of vram for a single frame
sorry, how did you come up with this 50gb number? Is that something you got from XPU? Or some other package? XPU only loads the data it needs onto GPU (demand-loading), so it might make use of less GPU ram than you initially think.

XPU will go out-of-core for textures, although the GPU will most likely run very slowly in this state.

On Windows, GPU memory will spill over to CPU memory (called "Shared memory") and seems to be still rather performant in this case. Please note this will not work on Linux, or if you have 2 GPUs.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]

Hewy Brian,
Apologies for the delayed response. I have since updated to houdini 20. Now the readout in the clone is not indicating an optix of zero..... It is still utilizing the gpus (at least one of them), even with the load memory peaking at 39 GBs.
I've attached a pick. thank you for your insights, and helping me resolve some of my own ignorance.

brians

April 14, 2024 22:22:57

jtk700cln
Is it possible to configure Houdini so that it is using the 4090 exclusively for karma render tasks, but the 4060 for everything else (viewport navigation, etc)?

As you already know, for XPU at least, you can use an environment variable to disable a specific GPU
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#disablingdevices [www.sidefx.com]

I'm not sure how to direct the (eg) OpenGL window to use only the other GPU. Perhaps try disabling one and just trying? And if that doesn't work, then try disabling the other instead? It might be that the OpenGL window only uses 1 GPU anyway.

jtk700cln
Apologies for the delayed response. I have since updated to houdini 20. Now the readout in the clone is not indicating an optix of zero..... It is still utilizing the gpus (at least one of them), even with the load memory peaking at 39 GBs.

Perhaps this is the "Sysmem fallback" feature behaving strangely.
What happens if you disable the feature in the NVidia control panel?

Also, what happens if you render (with the sysmem fallback feature enabled) but with just 1 GPU enabled in XPU. Do you get better out-of-core behavior then? The reason I ask for this case, is that we know this new NVidia/Windows memory feature doesn't work very well with multiple GPUs.