traileverse XPU at 11 mins? could that be a linux thing? Is it slower on windows? and CPU is doing half the work. Your GPU only is still better than my 3090. Why driver 535.129.03? is that the latest on linux?
Yes, the render engine on the XPU rendered the picture in 11 minutes. I can't say about peculiarities of XPU work on Windows or Linux, I haven't made tests on my configuration on Windows yet. The driver used is the latest stable from the repository.
yeah, something sounds way off. It takes 30 minutes to render with a 3090 doing 90% of the work on Windows. A 3090 is probably 2x as fast as a 3070Ti in rendering too. Also 51% work done by a 5950? I don't buy it. A 3070Ti gpu is at least a few times faster than that CPU. Maybe you rendered it to viewport size and not to MPlay at 1920x1080?
If that 11-mins rendering is correct then something is waay off and someone from sidefx needs to jump in here and check if something is wrong on windows. In my mind that would be the render times I'm expecting, hence on a 3090 (2x faster) that would be about 5-6 or 7mins per frame for that bubblewrap scene.
traileverse XPU at 11 mins? could that be a linux thing? Is it slower on windows? and CPU is doing half the work. Your GPU only is still better than my 3090. Why driver 535.129.03? is that the latest on linux?
Yes, the render engine on the XPU rendered the picture in 11 minutes. I can't say about peculiarities of XPU work on Windows or Linux, I haven't made tests on my configuration on Windows yet. The driver used is the latest stable from the repository.
yeah, something sounds way off. It takes 30 minutes to render with a 3090 doing 90% of the work on Windows. A 3090 is probably 2x as fast as a 3070Ti in rendering too. Also 51% work done by a 5950? I don't buy it. A 3070Ti gpu is at least a few times faster than that CPU. Maybe you rendered it to viewport size and not to MPlay at 1920x1080?
Oh yeah, that's my mistake. I should have specified render resolutions of 1280 x 720 in the post. I tried this test on H20 Apprentice the maximum render resolution for that version. I didn't look that the render node settings specified Full HD. Otherwise, the load between CPU and GPU was distributed as described in the previous post.
ok, I gave the bubble wrapper a spin using the original settings from the file.
Rendering on a threadripper pro 64 core, 2x 4500 RTX (they are not superfast but together should beat a 3090), windows 10
renderdistribution was 21%/25%/52% (optix/optix/embree)
rendertime was 14:38
There is still visible noise in the dof areas. To be honest, I could have stopped at about 50% and cleaned the rest up with denoise. But I guess that's not the point. I also find it intriguing, that the optix devices don't have the same rendercontribution and the cpu is faster then the two combined (with other scenes this is also sometimes the case, but usually, the gpus are much faster that the cpu).
Not sure how fast this would be an a 4000 card as I remember reading a post that they might be faster rendering refractions because of the shader sorting functionality.
ronald_a renderdistribution was 21%/25%/52% (optix/optix/embree)
Anything with lots of nested refraction is fairly problematic for GPU (creates lots of divergence). So I'm not surprised this bubble-wrap scene has speed issues with XPU. But it would be good to make sure there is not something else at play. What happens if you do a full render with the CPU device disabled? you can do that by setting this environment variable KARMA_XPU_DISABLE_EMBREE_DEVICE=1 https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#disablingdevices [www.sidefx.com]
ronald_a Not sure how fast this would be an a 4000 card as I remember reading a post that they might be faster rendering refractions because of the shader sorting functionality.
Rendered scene on 2x4090, ryzen 5950x, linux. One from the start of the thread not sure if someone sent some different versions. 1min7sec Increasing samples to 1024, took 1min58sec, still noisy but just to give some idea. Btw do we have any denoising options available in Karma XPU?
ronald_a renderdistribution was 21%/25%/52% (optix/optix/embree)
Anything with lots of nested refraction is fairly problematic for GPU (creates lots of divergence). So I'm not surprised this bubble-wrap scene has speed issues with XPU. But it would be good to make sure there is not something else at play. What happens if you do a full render with the CPU device disabled? you can do that by setting this environment variable KARMA_XPU_DISABLE_EMBREE_DEVICE=1 https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#disablingdevices [www.sidefx.com]
Disabling Embree clocks in at 29:06 which I guess is expected.
traileverse Could you share that optimized version? Also, are you on a 3090 as well?
Here's my version of a refraction-less transparent material.
and the result: 15:15
true refraction version: 35:08
These materials and render settings are not identical to the one from the content library so times can't be compared. I have increased refraction bounces to 16 to fill in the darkened areas of the true thin transmissive version. Color limit has been increased to 1000 to prevent clamping the blue reflections to white. This introduces additional fireflies, but with the color clamping a lot of energy is lost and doesn't tone map correctly.
traileverse Could you share that optimized version? Also, are you on a 3090 as well?
Here's my version of a refraction-less transparent material.
and the result: 15:15
Image Not Found
true refraction version: 35:08
Image Not Found
These materials and render settings are not identical to the one from the content library so times can't be compared. I have increased refraction bounces to 16 to fill in the darkened areas of the true thin transmissive version. Color limit has been increased to 1000 to prevent clamping the blue reflections to white. This introduces additional fireflies, but with the color clamping a lot of energy is lost and doesn't tone map correctly.
Thanks for this. I will let you know what my times are. Overall though, atm there isn’t any reason economically to go with XPU, it’s just much slower than I would’ve expected and for what I do, it simply would be foolish not to use redshift. It miles faster! I was truly hoping to make the switch but not right now! I’m also going to try octane, which from my understanding is also unbiased and I’m seeing tests online where it’s beating redshift in a few scenes. Quality is mightily import, but for a modern renderer and a GPU one at that, speed with a slight hit in quality is more important for motion design! I’m not gonna be spending 2k on a 4090 for these render times! now way, that’s putting me out of business. Karma IMO humble opinion, has ways to go in that department.
Mirko Jankovic Rendered scene on 2x4090, ryzen 5950x, linux. One from the start of the thread not sure if someone sent some different versions. 1min7sec Increasing samples to 1024, took 1min58sec, still noisy but just to give some idea. Btw do we have any denoising options available in Karma XPU?
hey, could you share your motherboard, PSW and what pc case you’re using with dual 4090?
You may try the modified hip file from this post [www.sidefx.com] where the portal geometry is added to improve the dome light sampling.
The denoiser can be enabled via - Karma Render Settings (node) > Image Output > Filters > Denoiser , or - Display Options > Enable Denoising
There are two denoisers available: Optix (interactive), Intel OIDN (only applies after render is completed).
The modified version of the scene does perform much better but still a ways behind RS. I haven’t tried it with denoise (it’s still a bit noisy too) so I will do that as well.
traileverse Thanks for this. I will let you know what my times are. Overall though, atm there isn’t any reason economically to go with XPU, it’s just much slower than I would’ve expected and for what I do, it simply would be foolish not to use redshift. It miles faster!
The choice is between Karma CPU and XPU and Mantra, 3rd party renderers never enter into my consideration as they have an added cost and lack the quality and flexibility compared to Mantra. And XPU is much faster than Mantra most of the time. The only question is, is it flexible enough.
traileverse Thanks for this. I will let you know what my times are. Overall though, atm there isn’t any reason economically to go with XPU, it’s just much slower than I would’ve expected and for what I do, it simply would be foolish not to use redshift. It miles faster!
The choice is between Karma CPU and XPU and Mantra, 3rd party renderers never enter into my consideration as they have an added cost and lack the quality and flexibility compared to Mantra. And XPU is much faster than Mantra most of the time. The only question is, is it flexible enough.
Man I wish I could say that, cause I’m not a fan of 3rd party anything either. Coming from after effects and C4D I don’t even like the words plug-in side by side. It’s one of the main reasons houdini became home because I could escape the need for a ton of overhead crap especially ones to do simple things the tools should do natively.
Except when it came to rendering! clients want things fast. And who am I kidding, I want my renders fast too! Whatever is fastest on the hardware. If XPU is a bit slower that’s fine. But 5-6-7 times slower, naaah! I’ll continue getting better with it because the more you know about the renderer the better times you can get.
Then also hope to see generous speed improvements in the future or maybe only improvements in the hardware will allow, whichever one.
traileverse Except when it came to rendering! clients want things fast. And who am I kidding, I want my renders fast too! Whatever is fastest on the hardware. If XPU is a bit slower that’s fine. But 5-6-7 times slower, naaah! I’ll continue getting better with it because the more you know about the renderer the better times you can get.
Then also hope to see generous speed improvements in the future or maybe only improvements in the hardware will allow, whichever one.
If speed is the most important thing, rendering in real time is probably worth consideration.
traileverse Except when it came to rendering! clients want things fast. And who am I kidding, I want my renders fast too! Whatever is fastest on the hardware. If XPU is a bit slower that’s fine. But 5-6-7 times slower, naaah! I’ll continue getting better with it because the more you know about the renderer the better times you can get.
Then also hope to see generous speed improvements in the future or maybe only improvements in the hardware will allow, whichever one.
If speed is the most important thing, rendering in real time is probably worth consideration.
Yeh I started using unreal like 2 months ago, lots of fun, getting better with it day by day and I feel houdini together with unreal is as good as it gets. The balance between quality and speed is what’s most important though, not just speed.
The topic of costs will lead to some interesting discussion, once Karma goes full commercial and costs per seat (that mantra deal was always really great in that regard, especially for shops strapped for cash but still with some racks full of nodes, like ours).
I went and tested the bubble wrap scene with Cycles, I remembered it having similar raw performance when I tested it a few years ago. Although it doesn't have a thin transmissive material, it can simulate it with the solidify modifier by making the surface double walled. After testing, I think there's something broken with XPU, performance wise. The raw sampling speed with cycles seems to be 30x higher than with XPU. I couldn't get the result to look exactly like the XPU one though so maybe there some incorrect thing cycles is doing that makes it faster. 4096 samples takes on the order of 3-5 minutes for the bubble wrap scene depending on how many bells and whistles you enable on the material. XPU was on the order of 30 minutes for 1024 samples with the absolute most basic shader.
this is the result with Cycles. It seems to be losing a lot of energy when it stacks up in depth, even though I have indirect unclamped and 24 bounces. is it just fast because it's cheating and wrong? I would think even 4096 wrong samples would take longer than 1024 good ones.
jsmack I went and tested the bubble wrap scene with Cycles, I remembered it having similar raw performance when I tested it a few years ago. Although it doesn't have a thin transmissive material, it can simulate it with the solidify modifier by making the surface double walled. After testing, I think there's something broken with XPU, performance wise. The raw sampling speed with cycles seems to be 30x higher than with XPU. I couldn't get the result to look exactly like the XPU one though so maybe there some incorrect thing cycles is doing that makes it faster. 4096 samples takes on the order of 3-5 minutes for the bubble wrap scene depending on how many bells and whistles you enable on the material. XPU was on the order of 30 minutes for 1024 samples with the absolute most basic shader.
this is the result with Cycles. It seems to be losing a lot of energy when it stacks up in depth, even though I have indirect unclamped and 24 bounces. is it just fast because it's cheating and wrong? I would think even 4096 wrong samples would take longer than 1024 good ones.
Image Not Found
Isnt translucent shader in cycles equivalent of "thinwalled"? Also in cycles you set hard limit on transparency rays, I see big difference on the right of the screen, maybe you should set more transparent rays in cycles to get closer results?
Edited by sniegockiszymon - Nov. 14, 2023 15:29:43