Karma XPU failure on 3090ti

   11074   41   1
User Avatar
Member
207 posts
Joined: 11月 2015
Offline
jsmack
You should not get black interiors, MaterialX in Karma enables fake caustics by default. You only need to enable caustics when you want cool patterns, not just to let light into an interior. Even with how fast XPU is, if you use true caustics you'll need 1000-1000000x as many rays to resolve it, as well as having to increase the color limit for indirect light to be as bright as your light sources.

Understood, heard on this, and thank you (both) for it; I will revisit this bit of my setup to optimize this part of it (I've been working on this for over a year or so, so it's got lots of warts as both Karma and my own understanding of USD have grown).

jsmack
Disabling caustics is all it took for it to work for me. I think there is bug there somewhere.

So do I understand you correctly that you can open the scene file in the Dropbox folder above, and it renders fine for you as-is? In that file, I have already disabled caustics on the renderSettings node, and it consistently fails for me...which indeed seems like a bug.
User Avatar
Member
207 posts
Joined: 11月 2015
Offline
To try to continue adding clues: in the scene setup I've provided (again, here):

https://www.dropbox.com/sh/xt6ozjuvpsi1tqa/AAA6n0SubrgdlgpYmiD5HJ_ba?dl=0 [www.dropbox.com]

If I either disable the animation sublayer OR set "motion blur" to 0 on the tree hierarchy using a Render Geometry Settings LOP, the Optix failure goes away. These clues suggest to me there's an issue related to motion blurring something with an opacity map perhaps.

Having said that, I set up a simpler test file (a_simpler_test.hiplc) in the same folder that uses the same shader on a spinning grid, and it renders fine for me. I remain confused.
Edited by dhemberg - 2022年10月26日 14:54:02
User Avatar
Member
7755 posts
Joined: 9月 2011
Online
dhemberg
So do I understand you correctly that you can open the scene file in the Dropbox folder above, and it renders fine for you as-is? In that file, I have already disabled caustics on the renderSettings node, and it consistently fails for me...which indeed seems like a bug.

No, it's not possible to render as-is since it's missing dependent files.
User Avatar
Member
207 posts
Joined: 11月 2015
Offline
Hi @jsmack:

I mentioned earlier that I updated the files in that folder a few hours ago today, and tested them on a totally separate computer to verify they're not missing any dependencies.

In this updated file, caustics is entirely disabled, and the scene causes Optix to fail for me.

Folder to all files/necessary dependencies (as far as I can tell):
https://www.dropbox.com/sh/xt6ozjuvpsi1tqa/AAA6n0SubrgdlgpYmiD5HJ_ba?dl=0 [www.dropbox.com]

Edited by dhemberg - 2022年10月26日 16:49:40

Attachments:
dbox.png (21.7 KB)

User Avatar
スタッフ
468 posts
Joined: 5月 2019
Offline
jsmack
We are using opacity here, for leaf cutout, but I think Karma uses ray continuation for opacity, no? It's not doing a sum all hits like mantra used to anymore. That's refraction in my book.

Sorry, I should be more clear on what I'm referring to.

For Path-Ray Opacity...
KarmaCPU uses ray continuation
KarmaXPU uses stochastic techniques

For Shadow Opacity...
Both renderers do the same thing (ie tint the shadow result)

In anycase, opacity is a more optimal form of ray-continuation than refraction. So they should still be talked about as separate things.

dhemberg
I'm not sure we're barking up the right tree here

I think you're right, sorry about that!

dhemberg
I assume that @jsmack implied that the underlying code in MaterialX uses transmission code for both opacity and transmission, but that could be me misunderstanding what he's saying.

Ah, I see the confusion.
Yes for shadows, with caustics disabled, transmission and Opacity do the same thing.

dhemberg
In any case, I've taken another try at isolating this issue, again here is a (hopefully more self-contained scene):

Thanks, I'll take a look at it, and hopefully can reproduce it on my end.
Edited by brians - 2022年10月26日 22:16:38
User Avatar
Member
207 posts
Joined: 11月 2015
Offline
brians
Thanks, I'll take a look at it, and hopefully can reproduce it on my end.

Hi @brians, could you confirm if you were able to reproduce this? I remain stuck on it.
User Avatar
Member
7755 posts
Joined: 9月 2011
Online
dhemberg
brians
Thanks, I'll take a look at it, and hopefully can reproduce it on my end.

Hi @brians, could you confirm if you were able to reproduce this? I remain stuck on it.

the scene you posted above doesn't crash

Oddly, I was able to reproduce the crash while fiddling with the materials while the render was going.

Adding a third texture to the material makes it crash.
Edited by jsmack - 2022年10月28日 17:56:41

Attachments:
tree_failure2.hiplc (4.7 MB)

User Avatar
Member
207 posts
Joined: 11月 2015
Offline
Hm, ok.

I'm not really sure how to proceed here. I'm fairly confident that the crashing I'm experiencing is a real thing; it's consistent, and I'm having trouble pinning down exactly what's going wrong. The limited output from XPU prevents me from being able to offer much in terms of debugging info, and the scene I provided most definitely crashes for me every time I open it. I'm very happy to be the canary in the coal mine and try to contrive other tests, though from your comments, @jsmack, it's not clear that this is useful.

So, again, I remain stuck.
User Avatar
Member
7755 posts
Joined: 9月 2011
Online
This makes no sense. Removing the tree animation prevents the crash, but the tree animation doesn't apply to the shapes with materials.

Pruning the animated portion allows the leaves to render without a crash, either by visibility or activation. Pruning the leaves allows the tree to render. Rendering both together with the extra texture and animation crashes.
User Avatar
Member
7755 posts
Joined: 9月 2011
Online
dhemberg
Hm, ok.

I'm not really sure how to proceed here. I'm fairly confident that the crashing I'm experiencing is a real thing; it's consistent, and I'm having trouble pinning down exactly what's going wrong. The limited output from XPU prevents me from being able to offer much in terms of debugging info, and the scene I provided most definitely crashes for me every time I open it. I'm very happy to be the canary in the coal mine and try to contrive other tests, though from your comments, @jsmack, it's not clear that this is useful.

So, again, I remain stuck.

The geometry you're rendering is probably funky. I would try rendering something else for now.

You could also use the work around of not connecting three textures, since re-using the same texture for diffuse and subsurface doesn't crash.
Edited by jsmack - 2022年10月28日 18:44:01
User Avatar
Member
207 posts
Joined: 11月 2015
Offline
jsmack
This makes no sense. Removing the tree animation prevents the crash, but the tree animation doesn't apply to the shapes with materials.

Pruning the animated portion allows the leaves to render without a crash, either by visibility or activation. Pruning the leaves allows the tree to render. Rendering both together with the extra texture and animation crashes.

Lol, ok, yup, thanks, this is my exact experience as well. I agree it makes no sense to me, I feel like I’m losing my mind trying to debug it.
User Avatar
スタッフ
468 posts
Joined: 5月 2019
Offline
dhemberg
I'm fairly confident that the crashing I'm experiencing is a real thing;

It is a real thing and has been happening to other people as well, in various shapes and forms. It started to occur about a month ago and is a very frustrating and stubborn bug that has been exhibiting heisenbug behavior. It happens to some people like clockwork, and others (like me) can not reproduce it at all.

I can say that motion-blur seems to be the main trigger, but other things can trigger it as well (eg switching back to KarmaCPU).

I'm saddened by it because up till now XPU had been very stable, even though the code was very new.

But we are working on it and will get there eventually.
Apologies, and thanks very much for your help and patience.

Brian
User Avatar
Member
7755 posts
Joined: 9月 2011
Online
brians
dhemberg
I'm fairly confident that the crashing I'm experiencing is a real thing;

It is a real thing and has been happening to other people as well, in various shapes and forms. It started to occur about a month ago and is a very frustrating and stubborn bug that has been exhibiting heisenbug behavior. It happens to some people like clockwork, and others (like me) can not reproduce it at all.

I can say that motion-blur seems to be the main trigger, but other things can trigger it as well (eg switching back to KarmaCPU).

I'm saddened by it because up till now XPU had been very stable, even though the code was very new.

But we are working on it and will get there eventually.
Apologies, and thanks very much for your help and patience.

Brian

Yes, very strange. XPU was very stable during the 19.5 alpha and beta cycle. This crash must be new or possibly linked to a change in the driver.
User Avatar
Member
207 posts
Joined: 11月 2015
Offline
brians
But we are working on it and will get there eventually.
Apologies, and thanks very much for your help and patience.

Of course, happy to try to be helpful further if I can; thanks so very much Brian!
User Avatar
Member
207 posts
Joined: 11月 2015
Offline
I'm still wrestling against this issue, it has severely impeded my project unfortunately. One thing I wanted to note: @jsmack suggested at one point that I could force Houdini to crash (making this Optix bug more evident, rather than simply noting unexpecedly-long render times when Optix fails) by disabling Embree in my houdini.env file. I thought this was a good idea, so I tried it.

Unfortunately, however, if I disable Embree, Karma simply outputs an empty image in the event of an Optix failure...which I found even more confusing behavior than the long render times in the event of an Embree fallback.

Properly catching this issue seems like it would rely on some sort of post-render script that looks at the error log and parses it for the Optix failure line.
User Avatar
スタッフ
468 posts
Joined: 5月 2019
Offline
dhemberg
I'm still wrestling against this issue, it has severely impeded my project unfortunately.

Sorry for the slow progress on this. We're strongly suspecting a driver bug.
Is it possible you could...
- try the latest NVidia driver 526.86
- and if that does not fix the issue, try rolling back to 496.13

dhemberg
if I disable Embree, Karma simply outputs an empty image in the event of an Optix failure.

Ah! we've had this reported elsewhere and have an open ticket for it.
We'll look into this soon, thanks for raising it!

Let us know how you get on with the driver.
thanks very much
Brian
Edited by brians - 2022年11月14日 18:02:34
User Avatar
Member
207 posts
Joined: 11月 2015
Offline
Hey Brian!

I'm hesitant to wave the victory flag too wildly, but upgrading to the latest Game driver as you suggested (526.86) yields:
  • The setup I shared on Dropbox - which previously consistently crashed for me - now seems to work.
  • The larger scene file that touched off this thread also seems to work without crashing.

This bug has been mysterious enough for me that I'm afraid to imagine a magic bullet fix like this is truly real, so I'm proceeding with some caution and double-checking other elements of my scene to make sure I've not just inadvertently enabled something that might sidestep the bug. But...I am cautiously optimistic!



I've been told strictly by support@ that I should always always use the Studio driver, so it wouldn't have occurred to me to try out the Game driver; I would have just presumed I'd be exacerbating the issue. So, thank you for the pointer!

Attachments:
trees.gif (2.5 MB)

User Avatar
Member
7755 posts
Joined: 9月 2011
Online
dhemberg
I've been told strictly by support@ that I should always always use the Studio driver, so it wouldn't have occurred to me to try out the Game driver; I would have just presumed I'd be exacerbating the issue. So, thank you for the pointer!

You never know what bugfixes go in what driver, or what features added to the driver introduces a bug. Could be either branch.
User Avatar
スタッフ
468 posts
Joined: 5月 2019
Offline
dhemberg
This bug has been mysterious enough for me that I'm afraid to imagine a magic bullet fix like this is truly real, so I'm proceeding with some caution and double-checking other elements of my scene to make sure I've not just inadvertently enabled something that might sidestep the bug. But...I am cautiously optimistic!

This is great news

We were getting more certain it was a driver issue, introduced sometime after 495.89, and were attempting to narrow it down to get some information back to NVidia (proving a little difficult given it exhibited heisenbug behavior). But the most recent driver (526.86) is working fine for us. So hearing it's working fine for you too gives me confidence this bug may have been fixed (or at least crawled back under its rock).

To be sure, I might check with NVidia to see what they've fixed in this last release.

In any case, lets keep our fingers crossed that the issue is gone, and if it does come back please let us know.

dhemberg
I've been told strictly by support@ that I should always use the Studio driver, so it wouldn't have occurred to me to try out the Game driver

Game drivers have the most recent tech/updates/fixes. Whereas Studio drivers have older code, but more testing and are considered more stable. So I assume after some time this fix will make its way into the Studio driver, after which you should probably move back to it.

Thanks very much for the testing/help!
Brian
Edited by brians - 2022年11月16日 05:21:52
User Avatar
スタッフ
468 posts
Joined: 5月 2019
Offline
I'm guessing you might already know, but NVidia released a new Studio driver last week (526.98)
  • Quick Links