Karma XPU Nvidia 5000 series support?

Forums Solaris and Karma Karma XPU Nvidia 5000 series support?

3353 16 1


Heileif: Member; 260 posts; Joined: Jan. 2015; Offline

March 24, 2025 7:58 p.m.

Hello,

Have anyone tried any of the new Nvidia 5000 series card with Karma XPU, and got it working?`

Redshift had to add support for the new cards to work. Wondering if Karma XPU needs to do the same? I did not find anything in the changelog about added support, but maybe I'm blind.

Edited by Heileif - March 24, 2025 23:14:51


brians: Staff; 582 posts; Joined: May 2019; Offline

March 25, 2025 5:05 a.m.

Nvidia 5000 series means an ADA card I think.
The main feature that came with ADA was the "shader execution reordering" (ie SER) which KarmaXPU has support for in 20.5
SER gave a 1.5-2x speedup


Heileif: Member; 260 posts; Joined: Jan. 2015; Offline

March 25, 2025 6:51 a.m.

Karma XPU optix will probably work then? and not fail to start? like Redshift?

Trying to get new cards, but kinda sad if Karma XPU optix fails when we get them.


jsmack: Member; 8173 posts; Joined: Sept. 2011; Online

March 25, 2025 1:58 p.m.

Heileif
Hello,

Have anyone tried any of the new Nvidia 5000 series card with Karma XPU, and got it working?`

Redshift had to add support for the new cards to work. Wondering if Karma XPU needs to do the same? I did not find anything in the changelog about added support, but maybe I'm blind.

do you mean geforce 5000 series, as in Blackwell? The RTX 5000 Ada has been out for a while now, but the RTX Pro 5000 Blackwell was just announced and isn't out yet as far as I know.


brians: Staff; 582 posts; Joined: May 2019; Offline

March 25, 2025 4:09 p.m.

Apparently the term "5000 series" is a little loaded :/
It means either...
- RTX 5000 Ada
or
- Blackwell range (5090 etc...)

Either way, Karma XPU does not have the same issue as RedShift. It should work on Blackwell fine without any updates, etc.


Heileif: Member; 260 posts; Joined: Jan. 2015; Offline

March 25, 2025 5:34 p.m.

Thanks for the replys.

RTX 5090 is the cards we are getting. Good to hear they will work


GnomeToys: Member; 24 posts; Joined: Jan. 2017; Offline

April 28, 2025 8:50 a.m.

They likely won't have support for the new RT core features of Blackwell for a while though if any are relevant... I think SER for Ada was added around a year and a half after the cards came out, and for me on a 4090 anyway it made renders roughly 2-3x faster as SideFX said it would so it was worth the wait... part of the issue is that NVidia loves releasing cards that technically have the features then taking their time adding them to drivers and libraries, although I'd argue that this is better than AMD's strategy of having hardware features that never get enabled (the geometry shader cores on Vega, for example) or disabling features that worked when the card came out (1/2 speed fp64 on Vega7 being lowered to something like 1/32 speed, for example).
Ada still has no library support for TransformerEngine on windows for example, although that only affects quantized inference of ML models which isn't as big a deal for most of the image related ones... the people training and running LLMs, or as things are turning out in reality, "new-school spammers" as I like to call them (if I'm not saying what I really think which is unprintable in most states), are the main group affected by that on ada and blackwell and they'll just run linux. It may affect Houdini down the line if some very useful generative 3D models that 8 and 4-bit float quantization can be used with show up but I'm not holding my breath.


Heileif: Member; 260 posts; Joined: Jan. 2015; Offline

April 29, 2025 11:23 a.m.

GnomeToys
They likely won't have support for the new RT core features of Blackwell for a while though if any are relevant... I think SER for Ada was added around a year and a half after the cards came out, and for me on a 4090 anyway it made renders roughly 2-3x faster as SideFX said it would so it was worth the wait... part of the issue is that NVidia loves releasing cards that technically have the features then taking their time adding them to drivers and libraries, although I'd argue that this is better than AMD's strategy of having hardware features that never get enabled (the geometry shader cores on Vega, for example) or disabling features that worked when the card came out (1/2 speed fp64 on Vega7 being lowered to something like 1/32 speed, for example).
Ada still has no library support for TransformerEngine on windows for example, although that only affects quantized inference of ML models which isn't as big a deal for most of the image related ones... the people training and running LLMs, or as things are turning out in reality, "new-school spammers" as I like to call them (if I'm not saying what I really think which is unprintable in most states), are the main group affected by that on ada and blackwell and they'll just run linux. It may affect Houdini down the line if some very useful generative 3D models that 8 and 4-bit float quantization can be used with show up but I'm not holding my breath.

The Shader Execution Reordering(SER) update was great for the 4090 cards. We got around x4 speed increase on renders with fur compared to 3090 cards.

If someone wonders, the 5090 cards have been stable so far for us.

Edited by Heileif - April 29, 2025 20:24:57


AgusGuri: Member; 2 posts; Joined: Jan. 2023; Offline

May 3, 2025 4:24 a.m.

Heileif
GnomeToys
They likely won't have support for the new RT core features of Blackwell for a while though if any are relevant... I think SER for Ada was added around a year and a half after the cards came out, and for me on a 4090 anyway it made renders roughly 2-3x faster as SideFX said it would so it was worth the wait... part of the issue is that NVidia loves releasing cards that technically have the features then taking their time adding them to drivers and libraries, although I'd argue that this is better than AMD's strategy of having hardware features that never get enabled (the geometry shader cores on Vega, for example) or disabling features that worked when the card came out (1/2 speed fp64 on Vega7 being lowered to something like 1/32 speed, for example).
Ada still has no library support for TransformerEngine on windows for example, although that only affects quantized inference of ML models which isn't as big a deal for most of the image related ones... the people training and running LLMs, or as things are turning out in reality, "new-school spammers" as I like to call them (if I'm not saying what I really think which is unprintable in most states), are the main group affected by that on ada and blackwell and they'll just run linux. It may affect Houdini down the line if some very useful generative 3D models that 8 and 4-bit float quantization can be used with show up but I'm not holding my breath.

The Shader Execution Reordering(SER) update was great for the 4090 cards. We got around x4 speed increase on renders with fur compared to 3090 cards.

If someone wonders, the 5090 cards have been stable so far for us.

Hi Heileif,

May i kindly ask, which drivers version are you using? also studio or gameready?
im having some cuda error with latest studio drivers on my new rtx5070ti

cheers


Heileif: Member; 260 posts; Joined: Jan. 2015; Offline

May 3, 2025 10:10 a.m.

AgusGuri
Heileif
GnomeToys
They likely won't have support for the new RT core features of Blackwell for a while though if any are relevant... I think SER for Ada was added around a year and a half after the cards came out, and for me on a 4090 anyway it made renders roughly 2-3x faster as SideFX said it would so it was worth the wait... part of the issue is that NVidia loves releasing cards that technically have the features then taking their time adding them to drivers and libraries, although I'd argue that this is better than AMD's strategy of having hardware features that never get enabled (the geometry shader cores on Vega, for example) or disabling features that worked when the card came out (1/2 speed fp64 on Vega7 being lowered to something like 1/32 speed, for example).
Ada still has no library support for TransformerEngine on windows for example, although that only affects quantized inference of ML models which isn't as big a deal for most of the image related ones... the people training and running LLMs, or as things are turning out in reality, "new-school spammers" as I like to call them (if I'm not saying what I really think which is unprintable in most states), are the main group affected by that on ada and blackwell and they'll just run linux. It may affect Houdini down the line if some very useful generative 3D models that 8 and 4-bit float quantization can be used with show up but I'm not holding my breath.

The Shader Execution Reordering(SER) update was great for the 4090 cards. We got around x4 speed increase on renders with fur compared to 3090 cards.

If someone wonders, the 5090 cards have been stable so far for us.

Hi Heileif,

May i kindly ask, which drivers version are you using? also studio or gameready?
im having some cuda error with latest studio drivers on my new rtx5070ti

cheers

Driver: Game Ready Driver 572.83
Houdini: 20.5.445

Had issues with the Studio Driver before, that's many years ago. But ended up staying away after the issue we had. Should probably start using them again, in theory they should be the safest ones to use.

Edited by Heileif - May 3, 2025 10:13:40


GnomeToys: Member; 24 posts; Joined: Jan. 2017; Offline

June 1, 2025 3:57 a.m.

Heileif
GnomeToys
They likely won't have support for the new RT core features of Blackwell for a while though if any are relevant... I think SER for Ada was added around a year and a half after the cards came out, and for me on a 4090 anyway it made renders roughly 2-3x faster as SideFX said it would so it was worth the wait... part of the issue is that NVidia loves releasing cards that technically have the features then taking their time adding them to drivers and libraries, although I'd argue that this is better than AMD's strategy of having hardware features that never get enabled (the geometry shader cores on Vega, for example) or disabling features that worked when the card came out (1/2 speed fp64 on Vega7 being lowered to something like 1/32 speed, for example).
Ada still has no library support for TransformerEngine on windows for example, although that only affects quantized inference of ML models which isn't as big a deal for most of the image related ones... the people training and running LLMs, or as things are turning out in reality, "new-school spammers" as I like to call them (if I'm not saying what I really think which is unprintable in most states), are the main group affected by that on ada and blackwell and they'll just run linux. It may affect Houdini down the line if some very useful generative 3D models that 8 and 4-bit float quantization can be used with show up but I'm not holding my breath.

The Shader Execution Reordering(SER) update was great for the 4090 cards. We got around x4 speed increase on renders with fur compared to 3090 cards.

If someone wonders, the 5090 cards have been stable so far for us.

Good to know. I plan on getting one eventually, but since I refuse to pay more than MSRP for video hardware it'll probably be another year. What kind of speed improvement did you see over the 4090?


coccarolla: Member; 132 posts; Joined: Aug. 2013; Offline

July 4, 2025 5:37 a.m.

GnomeToys
Heileif
GnomeToys
They likely won't have support for the new RT core features of Blackwell for a while though if any are relevant... I think SER for Ada was added around a year and a half after the cards came out, and for me on a 4090 anyway it made renders roughly 2-3x faster as SideFX said it would so it was worth the wait... part of the issue is that NVidia loves releasing cards that technically have the features then taking their time adding them to drivers and libraries, although I'd argue that this is better than AMD's strategy of having hardware features that never get enabled (the geometry shader cores on Vega, for example) or disabling features that worked when the card came out (1/2 speed fp64 on Vega7 being lowered to something like 1/32 speed, for example).
Ada still has no library support for TransformerEngine on windows for example, although that only affects quantized inference of ML models which isn't as big a deal for most of the image related ones... the people training and running LLMs, or as things are turning out in reality, "new-school spammers" as I like to call them (if I'm not saying what I really think which is unprintable in most states), are the main group affected by that on ada and blackwell and they'll just run linux. It may affect Houdini down the line if some very useful generative 3D models that 8 and 4-bit float quantization can be used with show up but I'm not holding my breath.

The Shader Execution Reordering(SER) update was great for the 4090 cards. We got around x4 speed increase on renders with fur compared to 3090 cards.

If someone wonders, the 5090 cards have been stable so far for us.

Good to know. I plan on getting one eventually, but since I refuse to pay more than MSRP for video hardware it'll probably be another year. What kind of speed improvement did you see over the 4090?

Also interested in this, if anyone has an update.


Heileif: Member; 260 posts; Joined: Jan. 2015; Offline

July 4, 2025 8:45 p.m.

Regarding performance. Looking at Deadline Monitor on renders running at the moment(mostly creatures with fur), Its around 30-40% increase. But many of the machines also have different CPUs.

The best part about the cards is the increase in VRAM. 4090 and 5090 is made with the same 5nm process so we did not get a huge increase in transistors in the same amount of die space, only a bigger die.

Next generation cards when manufacturing process will most likely be reduced, we hopefully get a big jump in transistors again like 3090(8nm) to 4090(5nm).

A little overview over transistors amount.
3090: 28 billion
4090: 76.3 billion
5090: 92.2 billion

Edited by Heileif - July 4, 2025 20:48:30


GnomeToys: Member; 24 posts; Joined: Jan. 2017; Offline

Aug. 21, 2025 4:11 p.m.

For your "tons of fur" use case I'd suppose the improvements to Blackwell that help the most with that particular feature aren't in Houdini / Karma yet either. Knowing NVidia they might not be in Optix or the drivers yet.

From the blackwell whitepaper:

Blackwell’s RT Core introduces hardware-based ray intersection testing support for a new
primitive called Linear Swept Spheres (LSS). A linear swept sphere is similar to a tessellated
curve, but is constructed by sweeping spheres across space in linear segments. The radii of the
spheres may differ between start and end point of each segment, allowing flexible approximation
of various strand types. As a special case of LSS, the Blackwell hardware primitive also supports
spheres directly (without a swept linear segment), which is useful for applications like particle
systems.

Common use cases, like the rendering of hair on humans, are about 2x faster with LSS compared
to DOTS, while also requiring about 5x less VRAM to store the geometry

As a new primitive I'd only assume Karma would have to explicitly use it.

They claim anything using SER will automatically take advantage of the 2.0 version of it, but then also say that applications can provide more information to assist in better reordering, which is somewhat confusing. They don't really give any numbers for speed increases there since it's likely very application specific.

The other big RT core increase seems to be related to optimizations for nanite-like geometry features that existed in Ada (BVH / triangle clusters) that supposedly double triangle throughput but I'm not sure if Karma uses this feature. It seems targeted heavily at Unreal Engine even though it's theoretically usable by anything.


Heileif: Member; 260 posts; Joined: Jan. 2015; Offline

Aug. 21, 2025 5:51 p.m.

GnomeToys
For your "tons of fur" use case I'd suppose the improvements to Blackwell that help the most with that particular feature aren't in Houdini / Karma yet either. Knowing NVidia they might not be in Optix or the drivers yet.

From the blackwell whitepaper:
Blackwell’s RT Core introduces hardware-based ray intersection testing support for a new
primitive called Linear Swept Spheres (LSS). A linear swept sphere is similar to a tessellated
curve, but is constructed by sweeping spheres across space in linear segments. The radii of the
spheres may differ between start and end point of each segment, allowing flexible approximation
of various strand types. As a special case of LSS, the Blackwell hardware primitive also supports
spheres directly (without a swept linear segment), which is useful for applications like particle
systems.

Common use cases, like the rendering of hair on humans, are about 2x faster with LSS compared
to DOTS, while also requiring about 5x less VRAM to store the geometry

As a new primitive I'd only assume Karma would have to explicitly use it.

They claim anything using SER will automatically take advantage of the 2.0 version of it, but then also say that applications can provide more information to assist in better reordering, which is somewhat confusing. They don't really give any numbers for speed increases there since it's likely very application specific.

The other big RT core increase seems to be related to optimizations for nanite-like geometry features that existed in Ada (BVH / triangle clusters) that supposedly double triangle throughput but I'm not sure if Karma uses this feature. It seems targeted heavily at Unreal Engine even though it's theoretically usable by anything.

This sounds like the technique that Redshift uses. Have been some complaining about how it looks close up when the hair/strands are thick.

https://redshift.maxon.net/topic/53084/interpolotion-along-curve-length-s-intinsic-t-issues?_=1755810011795 [redshift.maxon.net]


brians: Staff; 582 posts; Joined: May 2019; Offline

Aug. 21, 2025 9:53 p.m.

> Blackwell’s RT Core introduces hardware-based ray intersection testing support for a new
primitive called Linear Swept Spheres (LSS).

These are just regular linear curves (ie cones with little spheres at the intersection points), and get used automatically by Optix already. For some reason they've described it by the internal algorithm being used, rather than the primitive the user would experience.

> Blackwell hardware primitive also supports spheres directly (without a swept linear segment), which is useful for applications like particle systems.

We currently have our own sphere intersector, given Optix does not do backface hits (which is important for things like glass and uniform volumes). We could make use of the Blackwell intersector explicitly if someone had an opaque material, and also enabled backface culling on the points. But our current intersector seems very fast as it is, so we've not prioritized that piece of work yet.

> The other big RT core increase seems to be related to optimizations for nanite-like geometry features

This is their mega-geo thing.
https://developer.nvidia.com/blog/fast-ray-tracing-of-dynamic-scenes-using-nvidia-optix-9-and-nvidia-rtx-mega-geometry/ [developer.nvidia.com]

The dynamic-update stuff is not very useful for us ATM, given we're mostly limited by the speed of Houdini + USD + Hydra. But the subdivision-surface-LOD stuff looks interesting.


GnomeToys: Member; 24 posts; Joined: Jan. 2017; Offline

Aug. 22, 2025 6:20 a.m.

Thanks for explaining that more. :-)

It seems bizarre. With their talk about hair rendering and the volume (which I mentally downgraded to a surface at least) implied by a sphere I didn't even consider a lack of backface support, it seems like it would be needed for lots of uses of spheres if not hair. In games alone I could see wanting thin-film bubbles that didn't have any significant refractive effects to rise off of something, or maybe bubbles popping out of original DOOM style glowing green radioactive waste. Simple environmental stuff that could be made cheap with this and probably add as much ambience as a complicated path traced metal shader. One of their own examples is a torus that can be set to transmissive, even. It takes a nearly ideal piece / pieces of glass to not be able to see backface effects in real life. Most cheaper camera lenses don't even manage it and you might catch a reflection of your own eye in just the right light even if it'll never affect the captured image.

Reading a more extensive description on their blog it looks like they described it that way because that default curve type (LSS) is done in software on all prior models and hardware on blackwell, so it's a new hardware primitive but isn't actually new like you said.
DOTS had to be used to get hardware intersections on other cards and it seems to be kinda error-prone.

For someone already using the CUDA-based ray tracing API framework NVIDIA OptiX, LSS is already available as the default linear curve type, and works on all GPUs that OptiX supports. The OptiX version of LSS automatically uses a software fallback on GPUs prior to NVIDIA Blackwell GPUs, and the new hardware-accelerated primitive on GeForce RTX 50 Series GPUs, without needing any code changes.

I'm guessing most of the cost for accurate rendering is the hair shader itself, now that I think about it more. NVidia has a BCSDF Far-field hair shader as an example but I can't see fitting that into a game anywhere which is what they're talking about much of the time... The math is extremely complex for one effect but more importantly it's loaded with sin / cos / asin / etc and looks like it would tie up the special function units on an entire card if you weren't relying on their generative path-tracing neural nets to fill in most of the scene, in which case why use that hair shader in the first place?

I'd have been more impressed if they added smooth swept bezier curves (non segmented) and bezier patches / nurbs patches in hardware. Every once in a while I look at what game engines are doing and get surprised that poly models are still used as heavily as they are. I know they're nowhere near as easy to render as triangle geometry but it seems like GPUs should be powerful enough to handle it by now.

Adaptive subdivision of displaced stuff would be pretty cool. True displacement can get pretty expensive if there's enough high frequency detail once you turn dicing up enough to deal with it, but if the GPU could cheaply dice the area it's currently intersecting and only needed the map and base geometry it would only be a slight speed hit vs. potentially going over vram limits.

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts