Search | Forums

Forums Search

Found 238 posts.

Search results Show results as topic list.

Solaris and Karma » XPU stopped working after Houdini upgrade (605 to 653).


brians: 466 posts; Offline

yesterday 05:49:19

ajz3d
Brians said that they're loading a new driver binary now. I assume he had this libnvidia-ml.so in mind. So maybe they're loading from the wrong path?

The thing that has changed is that we are loading the libnvidia-ml.sofile. But it should live beside libcuda.someaning that if we can load one, then we should be able to load the other from the same path.

I'm not sure if its the distro or nvidia at fault, but I think we'll just fix Houdini to go looking for the libnvidia-ml.sofile in the location of the actual libcuda.sobinary (not the symlink). Hopefully that should address the issue.

Edited by brians - yesterday 05:58:31

See full post

Solaris and Karma » XPU stopped working after Houdini upgrade (605 to 653).


brians: 466 posts; Offline

April 16, 2024 19:39:04

Maybe a path issue.
We load two files dynamically at runtime, libcuda.soand libnvidia-ml.so.
Do they live beside each other in the same directory? Or is the libcuda.sofile in a different location?
Or maybe its sym-linked in another location, and that's what we're picking up?
I might put a log-message about where exactly we're picking these files up from, and that might give us more clues about where these files are being found at runtime.

Edited by brians - April 16, 2024 19:41:07

See full post

Solaris and Karma » what has your experience with solaris/karma been?


brians: 466 posts; Offline

April 16, 2024 04:03:14

evanrudefx
In the task manager it said not responding.

The Optix compiler does not allow interruptions :/
We're working with NVidia to try and get that fixed. Meaning that even if it is compiling jobs, Houdini shouldn't hang.
Thanks for your patience.

See full post

Solaris and Karma » XPU stopped working after Houdini upgrade (605 to 653).


brians: 466 posts; Offline

April 16, 2024 03:59:18

We are indeed loading a new driver binary.
try rendering via the offline "karma" command, at verbosity level 5, then post the log here.
I'm pretty sure we'll see an error message "Failed to load CUDA DSO ..."

See full post

Solaris and Karma » what has your experience with solaris/karma been?


brians: 466 posts; Offline

April 15, 2024 01:13:29

XPU (and Optix) need to recompile their internal code based on what combination of geometry and rendering features are active in the scene. The fact that it takes 3 minutes is something we're working with NVidia to improve.

The good news is that once that combination has been compiled, it gets cached, so when you load that scene again it should be instant.

For the next major release of Houdini we plan to build a "pre-compile" step where XPU will go away and pre-compile the code for all the combinations of features/geoemtry we ever expect someone to use. It'll probably take a few hours, so hopefully, people can do it either on their lunch break or over night. But once that is in place, these long stalls should go away.

And when you say "crashes", are you rendering in the IPR viewport? Or offline via karma/husk. Because if in the viewport, you should see a readout in the top-right corner saying "compile #" (with the # being the number of compilation jobs)

See full post

Technical Discussion » Karma Vram limitations - General Explanation


brians: 466 posts; Offline

April 14, 2024 22:22:57

jtk700cln
Is it possible to configure Houdini so that it is using the 4090 exclusively for karma render tasks, but the 4060 for everything else (viewport navigation, etc)?

As you already know, for XPU at least, you can use an environment variable to disable a specific GPU
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#disablingdevices [www.sidefx.com]

I'm not sure how to direct the (eg) OpenGL window to use only the other GPU. Perhaps try disabling one and just trying? And if that doesn't work, then try disabling the other instead? It might be that the OpenGL window only uses 1 GPU anyway.

jtk700cln
Apologies for the delayed response. I have since updated to houdini 20. Now the readout in the clone is not indicating an optix of zero..... It is still utilizing the gpus (at least one of them), even with the load memory peaking at 39 GBs.

Perhaps this is the "Sysmem fallback" feature behaving strangely.
What happens if you disable the feature in the NVidia control panel?

Also, what happens if you render (with the sysmem fallback feature enabled) but with just 1 GPU enabled in XPU. Do you get better out-of-core behavior then? The reason I ask for this case, is that we know this new NVidia/Windows memory feature doesn't work very well with multiple GPUs.

See full post

Solaris and Karma » what has your experience with solaris/karma been?


brians: 466 posts; Offline

April 11, 2024 04:09:05

evanrudefx
One of the great things about GPU rendering being able to iterate quickly. However, karma can be painfully slow to just start rendering. I believe the Docs mention this. Changing a material setting or enabling a hidden object/adding a new object (even light-weight objects) can result in having to wait long periods before rendering even starts. These waiting times grow as the stage becomes more complex. Sometimes closing the project and re-opening it helps with performance.

This is almost certainly shader compilation.
We have some big improvements coming in this area for the next major release of Houdini.

evanrudefx
. Hiding/showing primitives in the stage trigger crashes, switching on karma xpu triggers crashes, turning off xpu and going back to opengl triggers crashes, etc.

Please get these into bug reports (eg with an example scene with clear repro steps).
We often get crash bugs fixed within a matter of days once they're submitted.
Do you at least get crash logs? They're also very useful.

See full post

Solaris and Karma » [SOLVED] Karma spherical uv projection


brians: 466 posts; Offline

April 10, 2024 01:45:02

Here is an example MaterialX implementation

See full post

Solaris and Karma » Karma XPU - Machines with GPU present take longer to render


brians: 466 posts; Offline

April 9, 2024 22:08:34

Thanks for doing the tests

am_wilkins
2) Interesting, good to know. With that setting enabled, the GPU appears to fail to load much/keep VRAM loaded and then appears to just render purely on the CPU.

What render time did you get when using that NVidia driver setting?
Did it go back to being as fast as CPU-only again?
I'm wondering if its the NVidia driver memory swapping stuff that is slowing down your scene with GPU+CPU vs CPU-only

am_wilkins
As a point of failure, only really noted that about 25% into the frame rendering the CUDA cores stopped reporting utilization, however the GPU would still list itself at 100% sometimes.

Yea... we've found the Windows "GPU-utilization" metrics UI to be very incorrect for NVidia/Optix :/
For example, when rendering with two GPUs, the 2nd GPU often registers as not having any load, even though its working full-steam.

That last image is curious for me, xpu_render_stats_after_01.png
Does that show the stats after the GPU has failed, but while the CPU is still rendering the frame? Or has the whole frame finished at this stage (eg husk/karma has finished and closed). The thing I'm trying to determine is if the GPU releases all its memory at the point of failure, or if it only happens at the end of the frame.

Thanks

Edited by brians - April 9, 2024 22:09:27

See full post

Technical Discussion » Karma Vram limitations - General Explanation


brians: 466 posts; Offline

April 9, 2024 21:56:15

some random thoughts:

brians
In fact the final render readout on a single layer with these trees is 50 gb of vram for a single frame

sorry, how did you come up with this 50gb number? Is that something you got from XPU? Or some other package? XPU only loads the data it needs onto GPU (demand-loading), so it might make use of less GPU ram than you initially think.

XPU will go out-of-core for textures, although the GPU will most likely run very slowly in this state.

On Windows, GPU memory will spill over to CPU memory (called "Shared memory") and seems to be still rather performant in this case. Please note this will not work on Linux, or if you have 2 GPUs.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]

Edited by brians - April 9, 2024 21:58:58

See full post

Solaris and Karma » Karma XPU - Machines with GPU present take longer to render


brians: 466 posts; Offline

April 8, 2024 23:30:46

brians
Perhaps they're not being released when the GPU goes into an error-state. In the meantime I'll have a look at the code to see if this might be the case

I've checked this, and although we could be better at re-allocating resources after a device has failed (perhaps 1-2 threads we could be doing better with), this wouldn't explain the slowdown you're getting on your threadripper.

It seems definitely something to do with your GPU memory maxing out. So lets try few more things.

1)
Can you try stripping your scene down somewhat (so that it fits all in GPU memory), and then do a render? And report back what kinds of times you get with/without the GPU (ie by using that envvar)

2)
There is a new Windows/NVidia feature where Optix/CUDA GPU memory can "spill" over to the CPU ram when the GPU gets full.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]

This is still a very new feature and we're having trouble getting information about it. But I can see your machine is doing this because the "Shared GPU memory" is at 18gig. What happens if you set "CUDA system fallback policy" to "Prefer No Sysmem fallback" in the NVidia control panel? (which will disable this feature)

3)
Are you able to watch the task-manager stats during a render where the GPU fails. And perhaps take a screenshot. I want to see what happens at the point of failure. We should be releasing GPU (and also the "Shared GPU memory") resources once it fails, but perhaps we are not.

4)
We output stats for the devices into the EXR header (as long as the new driver is used, not legacy). They can be viewed using the command iinfo -v myfile.exr. Its very dense/unreadable sorry, but could you perhaps post the result of that here, for a gpu-failing render? We can see a little more about what is happening to each device.

We really should be outputting more information regarding device utilization/failure into the regular stats too. I'll get that prioritized.

cheers

Edited by brians - April 8, 2024 23:32:25

See full post

Houdini Lounge » Karma Xpu our-of-core in windows vs linux (strange behavior)


brians: 466 posts; Offline

April 8, 2024 19:05:08

On Windows, Microsoft/NVidia have implemented a way for Optix GPU memory to "spill over" to main CPU memory.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion [nvidia.custhelp.com]

There is not a lot of documentation or info about this feature, but we have formally asked NVidia to give us more and they're working on it. One thing we know is that it doesn't work if there are two GPUs in the machine.

you can disable the feature by setting the "CUDA system fallback policy" to "Prefer No Sysmem fallback" in the NVidia control panel. Once you have done that then windows and linux should behave the same.

See full post

Solaris and Karma » Karma - Unexplained increases in memory usage


brians: 466 posts; Offline

April 8, 2024 16:54:55

We are aware that KarmaXPU is using a lot more CPU ram than KarmaCPU.
We are working to reduce the CPU ram usage of KarmaXPU for the next major release of Houdini, so keep an eye out for that.

Regarding Arnold vs KarmaCPU, I'll be sure to pass this onto the KarmaCPU team to have a look/think about.

thanks

See full post

Solaris and Karma » Karma XPU - Machines with GPU present take longer to render


brians: 466 posts; Offline

April 8, 2024 04:42:15

I think the first thing to try is to render with only the EmbreeCPU device active on that machine, to make sure we get the same performance as the machine without the GPU.
So with this envvar here

KARMA_XPU_DISABLE_OPTIX_DEVICE=1

Please report back if you get the same performance, thanks.

We reserve some threads for GPU-shader-compilation, and release them for the EmbreeCPU device once compilation is done. Perhaps they're not being released when the GPU goes into an error-state. In the meantime I'll have a look at the code to see if this might be the case

See full post

Solaris and Karma » XPU and Ryzen's integrated GPU


brians: 466 posts; Offline

April 7, 2024 21:28:50

ajz3d
Will the raytracing capability of the integrated GPU chip be utilized by Karma XPU at its current development state?

No, because it is AMD.
XPU does not currently support AMD GPUs (integrated or discreet)

See full post

Solaris and Karma » Karma - Unexplained increases in memory usage


brians: 466 posts; Offline

April 3, 2024 15:19:24

am_wilkins
The logs only eventually say:
KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will be suppressed

Can you please post a screenshot of the error?
Sadly the forum markup can strip chunks of the error message out

am_wilkins
There is alot of instanced grass geo,

What is the geo? curves?
There is a curve memory/speed optimization for curve GPU memory that might help.
KARMA_XPU_OPTIX_CURVE_OPT_LEVEL=1
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#features [www.sidefx.com]

Are you able to post some memory stats?
There are some in the EXR header (iinfo -v myfile.exr), or you can see them in the viewport too (https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#howto)

See full post

Technical Discussion » Houdini not using one GPU


brians: 466 posts; Offline

March 27, 2024 19:04:07

Another thing you can do is to do some render tests.
So... perform renders with the following environment variables set.

This will test with only the first GPU enabled

KARMA_XPU_DISABLE_DEVICE_0=0
KARMA_XPU_DISABLE_DEVICE_1=1
KARMA_XPU_DISABLE_DEVICE_2=1

This will test with only the second GPU enabled

KARMA_XPU_DISABLE_DEVICE_0=1
KARMA_XPU_DISABLE_DEVICE_1=0
KARMA_XPU_DISABLE_DEVICE_2=1

This will test with both of the GPUs enabled.

KARMA_XPU_DISABLE_DEVICE_0=0
KARMA_XPU_DISABLE_DEVICE_1=0
KARMA_XPU_DISABLE_DEVICE_2=1

more instructions here
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#disablingdevices [www.sidefx.com]

I'm guessing you'll find the first two have the same approximate time, and the last will be approximately twice as fast. Please note that its best to run the tests twice, to ensure all GPU shaders are compiled and cached.

See full post

Technical Discussion » Houdini not using one GPU


brians: 466 posts; Offline

March 27, 2024 17:14:06

Are you determining this by looking at the windows task manager performance tab?
We've found that with 2 GPUs (have not tested with more sorry), the 2nd GPU doesn't seem to register any usage, even though it is running at full steam. Some hint that it is working is that you can see its temperature rise.
So this is an issue with the windows performance UI (probably something to do with the way Optix talks to the GPU under the hood :/ )

Evidence to show that it's running fine can be seen by looking at the header of the EXR image. If you render with the new driver (ie not the legacy driver), karma will put stats in the EXR header. You can then view those stats by this command line call

iinfo -v myfile.exr

The data is fairly dense sorry, but what we're looking for is the "xpu_device_samples" value associated with each device (which indicates how many passes each device contributed to the frame). So for a 128 sample image, all 3 devices should add to 128. I'm guessing the two GPU devices will have approximately the same value (indicating they're both working fine).

See full post

Technical Discussion » Stripes in fire sim after rendering in Karma


brians: 466 posts; Offline

March 19, 2024 18:25:01

you could try tweaking the volume step size control

See full post

Technical Discussion » Karma sequence render issue, exit code -1073741819


brians: 466 posts; Offline

March 18, 2024 23:47:39

SiRpRoHxO
hmm strange, according to the nvidia control panel I'm on 546.33 and GPU isnt working... (Im on Windows)
If you confirm that 546.33 is working on your end, I need to investigate
Otherwise i try to create a repro scene

I've tried with 20.0.625 and 546.33, and Karma XPU is working fine for me.
And are you saying with a more modern driver you get no GPU support at all? What happens in the viewport? Is it just sequence rendering that is broken for you? Or is it XPU in general?

thanks

See full post

First
1
2
3
4
Last

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts