Karma XPU just stops rendering after some frames?

   766   5   0
User Avatar
Member
11 posts
Joined: 2月 2020
Offline
Has anyone else had this infuriating issue that Karma xpu renders just deciding to stop after 5 or 10 frames? Neither gpu or CPU memory even close to capped and no errors or warnings in log viewer. I did get this non stop spam once after it quit. Never otherwise:

Event Queue dropping 1 events.
Event Queue Full. Events being dropped:
UI_EVENT_REFRESH to (null)
Event Queue dropping 1 events.
Event Queue Full. Events being dropped:
................

I just found another post now that I finally got some sort of error to search for. Apparently this happens with nodes doing bad things creating soo many queue items? Who knows. The workaround is to sent an environment variable HOUDINI_UI_QUEUE_SIZE to 99999. At 20k my render still just quit after about 20 frames, but that is farther than I've gotten before. I'm just going to keep increasing it I guess.

Sadly this is not helping. I've also tried updating to 20.0.620. Might have just been a coincidence that it got farther than ever before.
Edited by physixtential - 2024年2月17日 21:52:34

Attachments:
houdiniFail.png (72.2 KB)

User Avatar
スタッフ
479 posts
Joined: 5月 2019
Offline
What happens if you render using Karma CPU instead?
also, what happens if you render with GPU disabled with XPU (ie envvar KARMA_XPU_DISABLE_OPTIX_DEVICE=1 )
What driver are you on?

cheers
User Avatar
Member
11 posts
Joined: 2月 2020
Offline
brians
What happens if you render using Karma CPU instead?
also, what happens if you render with GPU disabled with XPU (ie envvar KARMA_XPU_DISABLE_OPTIX_DEVICE=1 )
What driver are you on?

cheers
Sorry for the delay. Thanks for spotting this post. CPU is so much slower I'd rather sit here and keep restarting XPU renders every 5 frames. I don't view CPU rendering as a reasonable option even if I have to manually render every frame on GPU. If I were willing to render with CPU I also wouldn't have invested months of effort in being materialx compatible xD. However, I will run KARMA_XPU_DISABLE_OPTIX_DEVICE=1 to help troubleshoot this. That is running now. Verified it is only using embreecpu.

As for drivers, It happens both on the RTX 4090 latest studio (currently 551.23) and game ready drivers. It doesn't seem to be a GPU or memory failure. It seems more like the scheduling of the frames to render just... ends. I notice in the render scheduler it progresses 0% to 100% for a single frame then sort of replaces that same render with a new one, as opposed to a progress for the whole range, or creating a new render item. It feels like there's possibly some timeout when scheduling the next task or something, but again, nothing useful in the log.

Edit: Update:
I can't reasonably test if this fails with CPU only. It just takes too long even with low samples to get to where a failure would occur.

I'm wondering if I could set up a pdg task to cook one frame at a time as work items. Karma render rop always cooks at least one frame when I click the button. If I just had something to click that button every time it stops rendering I'd be ok.
Edited by physixtential - 2024年2月21日 10:57:47
User Avatar
Member
1 posts
Joined: 1月 2016
Online
I had a similar issue with the 551.23 Studio driver - rolling back to the 546.33 Studio driver fixed it in my case
User Avatar
Member
11 posts
Joined: 2月 2020
Offline
Update...
Without change anything but the camera position and increasing primary samples from 16 to 128, I've managed to render an entire frame range on the exact same setup. I wonder if it was rendering frames so fast that the underlying task scheduler wasn't ready with the next frame? They previously took maybe 30 seconds round trip per frame.
Edited by physixtential - 2024年2月21日 12:07:16
User Avatar
スタッフ
479 posts
Joined: 5月 2019
Offline
physixtential
CPU is so much slower I'd rather sit here and keep restarting XPU renders every 5 frames. I don't view CPU rendering as a reasonable option even if I have to manually render every frame on GPU. If I were willing to render with CPU I also wouldn't have invested months of effort in being materialx compatible xD.

I understand it's not possible/practical for you to render your project using KarmaCPU, I'm not suggesting it as a workaround.

I'm asking for you to test with KarmaCPU so we can try and rule out any XPU-specific issues.

physixtential
I can't reasonably test if this fails with CPU only. It just takes too long even with low samples to get to where a failure would occur.

Same as with KarmaCPU. I'm trying to rule out any GPU/NVidia/OptiX-specific issues from the equation. What happens if you render with only EmbreeCPU at low sample count and low resolution? Does that render fast enough to get an answer?


Without a repro scene, and not being able to do tests on your end, it will be difficult for us to debug this issue. I'll ask around some other SideFX guys, maybe someone has seen this kind of thing before.
Edited by brians - 2024年2月25日 21:21:49
  • Quick Links