Karma XPU - dual 4090 RTX setup - performance issues

   8319   29   2
User Avatar
スタッフ
641 posts
Joined: 8月 2019
オフライン
You're comparing between two different CPUs - one is 5 years old and the other is 8 years old, they have different core counts and are from completely different generations of the Zen micro-architecture. The difference may not be the GPUs but the different CPU.

Even when you turn off rendering on the CPU, the CPU still needs to read in data, upload it to the GPU, download the results, then write the results to disk. If the CPU can't keep the GPU fed, especially when there are two of them, your performance will tank.

If you can, compare the difference of one vs two GPUs on the same processor and see what the difference is. Also, try setting the environment variable that Brian mentioned above.

From the docs:
In Karma XPU, the process of writing AOV data (e.g. deep image data, cryptomatte, etc...) can now be multi-threaded to improve performance, for when very fast (or multiple) GPUs are matched with a slow CPU. Enable via the `KARMA_XPU_NUM_PER_DEVICE_BLENDING_THREADS` environment variable.
Edited by johnmather - 2025年8月13日 16:06:34
User Avatar
Member
260 posts
Joined: 8月 2015
オフライン
First time I've notice difference was when I removed 1 GPU from 5950x machine and had almost identical render times.
So even on same machine, difference between 1 GPU and 2 GPU is like in this example.
This test was with that environment used.
I'll test again on this machine with 1 and 2 GPUs, but as mentioned first time I've noticed this happening was after I removed 1 GPU from my machine and had nearly identical render times.
Will be back with test from same machine
User Avatar
Member
260 posts
Joined: 8月 2015
オフライン
Here is test again, on same machine. So to note again it really does not matter about CPU in this case. All machines showcase same behavior. Now granted this renders really fast, some cartoon forest bunch of scattered trees, grass and characters and still renders so fast with MB, DOF and 4k. More complex scenes may benefit more from dual setup.
It is just that feels like extra GPUs in systems are wasted and being able to actually render each frame with single GPU rendering 2 frames at once (deadline concurrent tasks for example) would speed things up drastically and use 100% of resources. But GPU affinity selection from deadline workers does not work with Houdini.




Attachments:
A.PNG (81.9 KB)
B.PNG (79.0 KB)

User Avatar
スタッフ
582 posts
Joined: 5月 2019
オフライン
I'm still very interested to see what happens if you set this envvar
KARMA_XPU_NUM_PER_DEVICE_BLENDING_THREADS=4

and if that still doesn't give expected results, try testing outside of deadline (with the envvar still set)

ps:
This guy got a performance boost from that envvar
https://www.sidefx.com/forum/topic/101585/ [www.sidefx.com]
Edited by brians - 2025年8月14日 21:17:44
User Avatar
Member
260 posts
Joined: 1月 2015
オフライン
I could help you get GPU affinity work with Karma XPU. It's only some lines of code added to the Deadline plugin you are using.
User Avatar
Member
260 posts
Joined: 8月 2015
オフライン
This was tested inside Houdini Solaris view, not on Deadline and with the KARMA_XPU_NUM_PER_DEVICE_BLENDING_THREADS=3 environment variable.

I'll try from the command line as well, but I don't expect much of a difference. There is more difference with more complex, slower-to-render scenes. If I understood correctly, in H21 there will be an option to set which device to use, not just a disabling option like at the moment, and if that is so, then rendering concurrent tasks will be possible and would solve this with fast-to-render scenes.
User Avatar
Member
260 posts
Joined: 8月 2015
オフライン
Heileif
I could help you get GPU affinity work with Karma XPU. It's only some lines of code added to the Deadline plugin you are using.
That would be great!
User Avatar
Member
260 posts
Joined: 1月 2015
オフライン
What Deadline plugin are you using to render?

The code will only get worker GPU affinity from Deadline, and set KARMA_XPU_DEVICES environment variable for the renderjob. You will need to create a worker for each GPU in Deadline. But it works great, we have been using the same workflow for Redshift.
User Avatar
Member
260 posts
Joined: 8月 2015
オフライン
I'm on deadline 10.4.1.6 at the moment, just submitting ROPs directly from stage to Submit to Deadline.
Worker approach would work perfectly fine for me really.
I was trying using just deadline affinity but that alone did not work.
User Avatar
Member
260 posts
Joined: 1月 2015
オフライン
You are rendering with Houdini plugin? No USD generation that happens first? that get's rendered by a seperate plugin like the new Karma plugin they launched this year?

If you are rendering with Houdini plugin I can probaly only get GPU affinity to work for you. I only know how to set CPU affinity using Husk.
  • Quick Links