I believe this is shader compilation. Try leaving it for a very long time, eg 15-30mins (and with very high sample count, to make sure EmbreeCPU doesn’t finish the frame in the meantime).
If it starts eventually running, then yes it is a shader compilation, and we’ve either already adressed, or plan to address it, in the next major release of houdini
Found 241 posts.
Search results Show results as topic list.
Solaris and Karma » Karma XPU stuck on initialization and geo refuse to load
- brians
- 469 posts
- Offline
Solaris and Karma » Artifacts appears in Houdini XPU - UDIM Issues
- brians
- 469 posts
- Offline
I have not seen this bug before.
I suggest filing a bug (with repro steps + example scene) and we'll look into it soon
thanks
I suggest filing a bug (with repro steps + example scene) and we'll look into it soon
thanks
Solaris and Karma » XPU stopped working after Houdini upgrade (605 to 653).
- brians
- 469 posts
- Offline
Hi guys
I've made this change to 20.0.685
When you get a chance, can you please test and let me know either way.
thanks!
I've made this change to 20.0.685
When you get a chance, can you please test and let me know either way.
thanks!
Solaris and Karma » XPU stopped working after Houdini upgrade (605 to 653).
- brians
- 469 posts
- Offline
ajz3d
Brians said that they're loading a new driver binary now. I assume he had this libnvidia-ml.so in mind. So maybe they're loading from the wrong path?
The thing that has changed is that we are loading the
libnvidia-ml.so
file. But it should live beside libcuda.so
meaning that if we can load one, then we should be able to load the other from the same path. I'm not sure if its the distro or nvidia at fault, but I think we'll just fix Houdini to go looking for the
libnvidia-ml.so
file in the location of the actual libcuda.so
binary (not the symlink). Hopefully that should address the issue.
Edited by brians - 2024年4月18日 05:58:31
Solaris and Karma » XPU stopped working after Houdini upgrade (605 to 653).
- brians
- 469 posts
- Offline
Maybe a path issue.
We load two files dynamically at runtime,
Do they live beside each other in the same directory? Or is the
Or maybe its sym-linked in another location, and that's what we're picking up?
I might put a log-message about where exactly we're picking these files up from, and that might give us more clues about where these files are being found at runtime.
We load two files dynamically at runtime,
libcuda.so
and libnvidia-ml.so
.Do they live beside each other in the same directory? Or is the
libcuda.so
file in a different location?Or maybe its sym-linked in another location, and that's what we're picking up?
I might put a log-message about where exactly we're picking these files up from, and that might give us more clues about where these files are being found at runtime.
Edited by brians - 2024年4月16日 19:41:07
Solaris and Karma » what has your experience with solaris/karma been?
- brians
- 469 posts
- Offline
evanrudefx
In the task manager it said not responding.
The Optix compiler does not allow interruptions :/
We're working with NVidia to try and get that fixed. Meaning that even if it is compiling jobs, Houdini shouldn't hang.
Thanks for your patience.
Solaris and Karma » XPU stopped working after Houdini upgrade (605 to 653).
- brians
- 469 posts
- Offline
We are indeed loading a new driver binary.
try rendering via the offline "karma" command, at verbosity level 5, then post the log here.
I'm pretty sure we'll see an error message "Failed to load CUDA DSO ..."
try rendering via the offline "karma" command, at verbosity level 5, then post the log here.
I'm pretty sure we'll see an error message "Failed to load CUDA DSO ..."
Solaris and Karma » what has your experience with solaris/karma been?
- brians
- 469 posts
- Offline
XPU (and Optix) need to recompile their internal code based on what combination of geometry and rendering features are active in the scene. The fact that it takes 3 minutes is something we're working with NVidia to improve.
The good news is that once that combination has been compiled, it gets cached, so when you load that scene again it should be instant.
For the next major release of Houdini we plan to build a "pre-compile" step where XPU will go away and pre-compile the code for all the combinations of features/geoemtry we ever expect someone to use. It'll probably take a few hours, so hopefully, people can do it either on their lunch break or over night. But once that is in place, these long stalls should go away.
And when you say "crashes", are you rendering in the IPR viewport? Or offline via karma/husk. Because if in the viewport, you should see a readout in the top-right corner saying "compile #" (with the # being the number of compilation jobs)
The good news is that once that combination has been compiled, it gets cached, so when you load that scene again it should be instant.
For the next major release of Houdini we plan to build a "pre-compile" step where XPU will go away and pre-compile the code for all the combinations of features/geoemtry we ever expect someone to use. It'll probably take a few hours, so hopefully, people can do it either on their lunch break or over night. But once that is in place, these long stalls should go away.
And when you say "crashes", are you rendering in the IPR viewport? Or offline via karma/husk. Because if in the viewport, you should see a readout in the top-right corner saying "compile #" (with the # being the number of compilation jobs)
Technical Discussion » Karma Vram limitations - General Explanation
- brians
- 469 posts
- Offline
jtk700cln
Is it possible to configure Houdini so that it is using the 4090 exclusively for karma render tasks, but the 4060 for everything else (viewport navigation, etc)?
As you already know, for XPU at least, you can use an environment variable to disable a specific GPU
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#disablingdevices [www.sidefx.com]
I'm not sure how to direct the (eg) OpenGL window to use only the other GPU. Perhaps try disabling one and just trying? And if that doesn't work, then try disabling the other instead? It might be that the OpenGL window only uses 1 GPU anyway.
jtk700cln
Apologies for the delayed response. I have since updated to houdini 20. Now the readout in the clone is not indicating an optix of zero..... It is still utilizing the gpus (at least one of them), even with the load memory peaking at 39 GBs.
Perhaps this is the "Sysmem fallback" feature behaving strangely.
What happens if you disable the feature in the NVidia control panel?
Also, what happens if you render (with the sysmem fallback feature enabled) but with just 1 GPU enabled in XPU. Do you get better out-of-core behavior then? The reason I ask for this case, is that we know this new NVidia/Windows memory feature doesn't work very well with multiple GPUs.
Solaris and Karma » what has your experience with solaris/karma been?
- brians
- 469 posts
- Offline
evanrudefx
One of the great things about GPU rendering being able to iterate quickly. However, karma can be painfully slow to just start rendering. I believe the Docs mention this. Changing a material setting or enabling a hidden object/adding a new object (even light-weight objects) can result in having to wait long periods before rendering even starts. These waiting times grow as the stage becomes more complex. Sometimes closing the project and re-opening it helps with performance.
This is almost certainly shader compilation.
We have some big improvements coming in this area for the next major release of Houdini.
evanrudefx
. Hiding/showing primitives in the stage trigger crashes, switching on karma xpu triggers crashes, turning off xpu and going back to opengl triggers crashes, etc.
Please get these into bug reports (eg with an example scene with clear repro steps).
We often get crash bugs fixed within a matter of days once they're submitted.
Do you at least get crash logs? They're also very useful.
Solaris and Karma » [SOLVED] Karma spherical uv projection
- brians
- 469 posts
- Offline
Solaris and Karma » Karma XPU - Machines with GPU present take longer to render
- brians
- 469 posts
- Offline
Thanks for doing the tests
What render time did you get when using that NVidia driver setting?
Did it go back to being as fast as CPU-only again?
I'm wondering if its the NVidia driver memory swapping stuff that is slowing down your scene with GPU+CPU vs CPU-only
Yea... we've found the Windows "GPU-utilization" metrics UI to be very incorrect for NVidia/Optix :/
For example, when rendering with two GPUs, the 2nd GPU often registers as not having any load, even though its working full-steam.
That last image is curious for me, xpu_render_stats_after_01.png
Does that show the stats after the GPU has failed, but while the CPU is still rendering the frame? Or has the whole frame finished at this stage (eg husk/karma has finished and closed). The thing I'm trying to determine is if the GPU releases all its memory at the point of failure, or if it only happens at the end of the frame.
Thanks
am_wilkins
2) Interesting, good to know. With that setting enabled, the GPU appears to fail to load much/keep VRAM loaded and then appears to just render purely on the CPU.
What render time did you get when using that NVidia driver setting?
Did it go back to being as fast as CPU-only again?
I'm wondering if its the NVidia driver memory swapping stuff that is slowing down your scene with GPU+CPU vs CPU-only
am_wilkins
As a point of failure, only really noted that about 25% into the frame rendering the CUDA cores stopped reporting utilization, however the GPU would still list itself at 100% sometimes.
Yea... we've found the Windows "GPU-utilization" metrics UI to be very incorrect for NVidia/Optix :/
For example, when rendering with two GPUs, the 2nd GPU often registers as not having any load, even though its working full-steam.
That last image is curious for me, xpu_render_stats_after_01.png
Does that show the stats after the GPU has failed, but while the CPU is still rendering the frame? Or has the whole frame finished at this stage (eg husk/karma has finished and closed). The thing I'm trying to determine is if the GPU releases all its memory at the point of failure, or if it only happens at the end of the frame.
Thanks
Edited by brians - 2024年4月9日 22:09:27
Technical Discussion » Karma Vram limitations - General Explanation
- brians
- 469 posts
- Offline
some random thoughts:
XPU will go out-of-core for textures, although the GPU will most likely run very slowly in this state.
On Windows, GPU memory will spill over to CPU memory (called "Shared memory") and seems to be still rather performant in this case. Please note this will not work on Linux, or if you have 2 GPUs.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]
brianssorry, how did you come up with this 50gb number? Is that something you got from XPU? Or some other package? XPU only loads the data it needs onto GPU (demand-loading), so it might make use of less GPU ram than you initially think.
In fact the final render readout on a single layer with these trees is 50 gb of vram for a single frame
XPU will go out-of-core for textures, although the GPU will most likely run very slowly in this state.
On Windows, GPU memory will spill over to CPU memory (called "Shared memory") and seems to be still rather performant in this case. Please note this will not work on Linux, or if you have 2 GPUs.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]
Edited by brians - 2024年4月9日 21:58:58
Solaris and Karma » Karma XPU - Machines with GPU present take longer to render
- brians
- 469 posts
- Offline
brians
Perhaps they're not being released when the GPU goes into an error-state. In the meantime I'll have a look at the code to see if this might be the case
I've checked this, and although we could be better at re-allocating resources after a device has failed (perhaps 1-2 threads we could be doing better with), this wouldn't explain the slowdown you're getting on your threadripper.
It seems definitely something to do with your GPU memory maxing out. So lets try few more things.
1)
Can you try stripping your scene down somewhat (so that it fits all in GPU memory), and then do a render? And report back what kinds of times you get with/without the GPU (ie by using that envvar)
2)
There is a new Windows/NVidia feature where Optix/CUDA GPU memory can "spill" over to the CPU ram when the GPU gets full.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]
This is still a very new feature and we're having trouble getting information about it. But I can see your machine is doing this because the "Shared GPU memory" is at 18gig. What happens if you set "CUDA system fallback policy" to "Prefer No Sysmem fallback" in the NVidia control panel? (which will disable this feature)
3)
Are you able to watch the task-manager stats during a render where the GPU fails. And perhaps take a screenshot. I want to see what happens at the point of failure. We should be releasing GPU (and also the "Shared GPU memory") resources once it fails, but perhaps we are not.
4)
We output stats for the devices into the EXR header (as long as the new driver is used, not legacy). They can be viewed using the command
iinfo -v myfile.exr
. Its very dense/unreadable sorry, but could you perhaps post the result of that here, for a gpu-failing render? We can see a little more about what is happening to each device.We really should be outputting more information regarding device utilization/failure into the regular stats too. I'll get that prioritized.
cheers
Edited by brians - 2024年4月8日 23:32:25
Houdini Lounge » Karma Xpu our-of-core in windows vs linux (strange behavior)
- brians
- 469 posts
- Offline
On Windows, Microsoft/NVidia have implemented a way for Optix GPU memory to "spill over" to main CPU memory.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]
There is not a lot of documentation or info about this feature, but we have formally asked NVidia to give us more and they're working on it. One thing we know is that it doesn't work if there are two GPUs in the machine.
you can disable the feature by setting the "CUDA system fallback policy" to "Prefer No Sysmem fallback" in the NVidia control panel. Once you have done that then windows and linux should behave the same.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]
There is not a lot of documentation or info about this feature, but we have formally asked NVidia to give us more and they're working on it. One thing we know is that it doesn't work if there are two GPUs in the machine.
you can disable the feature by setting the "CUDA system fallback policy" to "Prefer No Sysmem fallback" in the NVidia control panel. Once you have done that then windows and linux should behave the same.
Solaris and Karma » Karma - Unexplained increases in memory usage
- brians
- 469 posts
- Offline
We are aware that KarmaXPU is using a lot more CPU ram than KarmaCPU.
We are working to reduce the CPU ram usage of KarmaXPU for the next major release of Houdini, so keep an eye out for that.
Regarding Arnold vs KarmaCPU, I'll be sure to pass this onto the KarmaCPU team to have a look/think about.
thanks
We are working to reduce the CPU ram usage of KarmaXPU for the next major release of Houdini, so keep an eye out for that.
Regarding Arnold vs KarmaCPU, I'll be sure to pass this onto the KarmaCPU team to have a look/think about.
thanks
Solaris and Karma » Karma XPU - Machines with GPU present take longer to render
- brians
- 469 posts
- Offline
I think the first thing to try is to render with only the EmbreeCPU device active on that machine, to make sure we get the same performance as the machine without the GPU.
So with this envvar here
Please report back if you get the same performance, thanks.
We reserve some threads for GPU-shader-compilation, and release them for the EmbreeCPU device once compilation is done. Perhaps they're not being released when the GPU goes into an error-state. In the meantime I'll have a look at the code to see if this might be the case
So with this envvar here
KARMA_XPU_DISABLE_OPTIX_DEVICE=1
Please report back if you get the same performance, thanks.
We reserve some threads for GPU-shader-compilation, and release them for the EmbreeCPU device once compilation is done. Perhaps they're not being released when the GPU goes into an error-state. In the meantime I'll have a look at the code to see if this might be the case
Solaris and Karma » XPU and Ryzen's integrated GPU
- brians
- 469 posts
- Offline
ajz3d
Will the raytracing capability of the integrated GPU chip be utilized by Karma XPU at its current development state?
No, because it is AMD.
XPU does not currently support AMD GPUs (integrated or discreet)
Solaris and Karma » Karma - Unexplained increases in memory usage
- brians
- 469 posts
- Offline
am_wilkins
The logs only eventually say:
KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will be suppressed
Can you please post a screenshot of the error?
Sadly the forum markup can strip chunks of the error message out
am_wilkins
There is alot of instanced grass geo,
What is the geo? curves?
There is a curve memory/speed optimization for curve GPU memory that might help.
KARMA_XPU_OPTIX_CURVE_OPT_LEVEL=1
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#features [www.sidefx.com]
Are you able to post some memory stats?
There are some in the EXR header (
iinfo -v myfile.exr
), or you can see them in the viewport too (https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#howto)
Technical Discussion » Houdini not using one GPU
- brians
- 469 posts
- Offline
Another thing you can do is to do some render tests.
So... perform renders with the following environment variables set.
This will test with only the first GPU enabled
This will test with only the second GPU enabled
This will test with both of the GPUs enabled.
more instructions here
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#disablingdevices [www.sidefx.com]
I'm guessing you'll find the first two have the same approximate time, and the last will be approximately twice as fast. Please note that its best to run the tests twice, to ensure all GPU shaders are compiled and cached.
So... perform renders with the following environment variables set.
This will test with only the first GPU enabled
KARMA_XPU_DISABLE_DEVICE_0=0 KARMA_XPU_DISABLE_DEVICE_1=1 KARMA_XPU_DISABLE_DEVICE_2=1
This will test with only the second GPU enabled
KARMA_XPU_DISABLE_DEVICE_0=1 KARMA_XPU_DISABLE_DEVICE_1=0 KARMA_XPU_DISABLE_DEVICE_2=1
This will test with both of the GPUs enabled.
KARMA_XPU_DISABLE_DEVICE_0=0 KARMA_XPU_DISABLE_DEVICE_1=0 KARMA_XPU_DISABLE_DEVICE_2=1
more instructions here
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#disablingdevices [www.sidefx.com]
I'm guessing you'll find the first two have the same approximate time, and the last will be approximately twice as fast. Please note that its best to run the tests twice, to ensure all GPU shaders are compiled and cached.
-
- Quick Links