Karma critical error: unable to create Cuda context

   3965   15   1
User Avatar
Member
62 posts
Joined: July 2007
Online
Hi guys,

I'm using Karma for some creature rendering involving fur. It's incredibly fast but at some point it startskicking up the following errors and the render speed drops dramatically.
I have the latest drivers.
Restart does fix it but the problem does return.
Any suggestions?

Attachments:
Screenshot 2024-05-08 083812.png (75.1 KB)

User Avatar
Member
62 posts
Joined: July 2007
Online
Apologies. I did a clean driver reinstall through geforce experience and it seems to have resolved the issue thanks!
User Avatar
Member
62 posts
Joined: July 2007
Online
No I'm lying. The issue persists.
User Avatar
Member
62 posts
Joined: July 2007
Online
So dropping the body fur density attribute from 1 000 000 to 800 000 seems to get the render going. Is this a gpu memory limit of some sort? Is there a way to monitor or manage gpu mem usage in karma? Other than keeping an eye on the task manager, which doesn’t seem to be hitting its limit, is there a way to get some other error info?
Btw it seems to be with this one particular fur object. There are multiple others (mane (which also has 1 mil density count), face, whiskers, chin etc) and they seem to render fine.
User Avatar
Member
15 posts
Joined: Jan. 2017
Offline
i have the exact same experience. and im on a live job that ive taken the leap to try it with karma, and right now its playing me up!

really need to know the fix for this. it shows up in Hou 20.0.590, hou 20.0.724. and ive updated my nvidia drivers to 555.99. still issues. takes about 5 mins of viewport rendering before it errors.

not even a complicated scene. love XPU. just need it to work now!
User Avatar
Staff
583 posts
Joined: May 2019
Offline
Unable to even create a CUDA context to start with indicates a driver issue of some sort :/

Lenscowboy
Is there a way to monitor or manage gpu mem usage in karma? Other than keeping an eye on the task manager, which doesn’t seem to be hitting its limit, is there a way to get some other error info?

In the viewport you can look at the on-screen stats. Take note of the "peak" memory readouts.
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#howto [www.sidefx.com]

If you do a offline karma render, then do it at verbosity level 5 to see memory readouts from XPU in the log.

Lenscowboy
Is this a gpu memory limit of some sort?

Are you on Windows or Linux?
It seems this issue has crept into the latest 550 or 555 drivers. What happens if you roll back a little bit? eg to early 550 or even a 545 branch?

Also, do keep an eye on your memory usage in task manager (or manually via nvidia-smi on Linux). If you're at the capacity of the GPU then that might explain the driver instability.

Lenscowboy
So dropping the body fur density attribute from 1 000 000 to 800 000 seems to get the render going.

Try setting this envar KARMA_XPU_OPTIX_CURVE_OPT_LEVELto 1
This will halve the amount of GPU memory used by the curves.
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#features [www.sidefx.com]

jamison1
it shows up in Hou 20.0.590, hou 20.0.724. and ive updated my nvidia drivers to 555.99. still issues.

What did you update from?
Are you on Windows or Linux?
What happens if you roll back to a 550 or even 545 driver?
User Avatar
Member
15 posts
Joined: Jan. 2017
Offline
hi,

yeah im on windows 11. not sure what driver was on prior. am trying nvidia 546.65 now. see what happens.
User Avatar
Member
15 posts
Joined: Jan. 2017
Offline
driver 546.65 still doesnt work. i dont get erros when offline rendering, i think cos the render per frame only takes a minute.

its the IPR sessions. it takes a good 5 minutes to error.

wish karma could just restart and start from fresh without closing houdini! wouldnt be SO annoying.
User Avatar
Member
15 posts
Joined: Jan. 2017
Offline
is there a way to use Intel denoise with ipr instead? might help?
User Avatar
Member
15 posts
Joined: Jan. 2017
Offline
this is my IPR session render stats showing the cuda error illegal address after a couple mins.

is there anything of use in this? using 3090 x2 and it errors in hou 20.0.590 and 20.0.724 using 546.65 driver.
Edited by jamison1 - June 13, 2024 07:32:14

Attachments:
karma errors jpg.jpg (125.1 KB)

User Avatar
Staff
583 posts
Joined: May 2019
Offline
jamison1
wish karma could just restart and start from fresh without closing houdini! wouldnt be SO annoying.

Houdini is closing? Like... crashing?
Or are you meaning you need to restart Houdini yourself in order to get XPU to work again?

jamison1
is there a way to use Intel denoise with ipr instead? might help?

Yes you can activate this in the viewport.
There is a little sphere-like icon on the right of the viewport.
https://www.sidefx.com/docs/houdini/render/optixdenoiser.html [www.sidefx.com]

jamison1
is there anything of use in this? using 3090 x2 and it errors in hou 20.0.590 and 20.0.724 using 546.65 driver.

So it looks like you're not running out of GPU memory, but XPU isn't the only thing that could be using up GPU ram. What does the taskmanager say about GPU memory usage before/after the error happens?

Also, are you running the vulkan viewport? Or is it still the default OpenGL viewport?

I think we need to get a repro scene from you (+ some clear steps on how to repro the issue)
User Avatar
Member
15 posts
Joined: Jan. 2017
Offline
-when redshift runs out of memory for example and throws errors. i close it and rerendering it, sorts itself out again. id like karma to be able to reload itself and get over the errors and start fresh without closing houdini and reopening. such a pain.

-id like to use the intel denoiser instead of the nvidia one for iPr too. thats not possible yet? if i am using a karma node to display and use as render settings, and thats then set to use the intel denoiser instead of nvidia, does the IPR viewport interactive render THEN use the intel denoise when that sphere icon is pressed?

-ive been closing all the other applications, as yes that could eat some ram up. fresh machine restart too. and it still errors.


-ive attached task manager shots of both gpus before and after the errors.

-errors using Karma XPU in the viewport. whatever is default regarding vulkan/opengl. but its the ipr running of XPU that errors after a minute or two.

- i cant supply this actual scene as its NDA stuff. but maybe i can try recreating using generic geo. i'll see if it still errors.

thanks for replying. appreciate it.

TASK MANAGER screenshots:
before the errors




after the errors



Edited by jamison1 - June 14, 2024 04:42:18

Attachments:
karma errors_POST_ERROR_taskManager_GPU2.png (34.5 KB)
karma errors_PRE_ERROR_taskManager_GPU1.png (34.1 KB)
karma errors_PRE_ERROR_taskManager_GPU2.png (34.3 KB)
karma errors_POST_ERROR_taskManager_GPU1.png (33.7 KB)

User Avatar
Member
15 posts
Joined: Jan. 2017
Offline
if i do get a scene i can share, how do i share it with you privately?
User Avatar
Member
15 posts
Joined: Jan. 2017
Offline
so ive been trying to make a scene i can share to you. in doing that i deleted some of the more sensitive objects in the scene, and NOW it doesnt error!

so now trying to find the cause. thinking maybe some NAN geo or something polydoctor can fix. fingers crossed.

its either something to do with my meshes OR in the reduction of the objects ive made the scene simpler and it can now handle it. dont know which yet.

but its not an earth shatteringly complex scene anyway. let you know
User Avatar
Member
62 posts
Joined: July 2007
Online
Thanks for all the info BrianS.
I did get some info back from support and there seems to be a known issue with my CPU that may or may not be related to this issue. It involves setting power profiles and what not according to intel's recommendations but I haven't figured it all out yet. I still get the errors when I'm rendering in the viewport but I'll keep tinkering.
User Avatar
Member
62 posts
Joined: July 2007
Online
My stats. It's a 4090 so at 11gigs I don't think I'm blowing it out.
Edited by Lenscowboy - July 9, 2024 10:35:48

Attachments:
Screenshot 2024-07-09 163333.png (21.5 KB)

  • Quick Links