GeomSubset vs. Mesh: Huge difference in Render Times

   885   5   5
User Avatar
Member
47 posts
Joined: Feb. 2017
Offline
So far I've always gone with Meshes for subsets instead of GeomSubsets. I prefer the flexibility in most cases. However, today I exported an old max production scene via the 3dsmax usd exporter - which made geometrysubsets by default for all materials. When converting those to meshes via the splitprimitive node, I realized that the render times with Karma CPU suddenly tripled. The scene went from 2 minutes to 6 minutes render time, no other changes.
I'm aware that meshes are slower to parse and thus slower to work with in Solaris, but I did not expect such a dramatic difference in render times. Is this expected behavior or should this go into a bug report? (Images below for reference in terms of render times and scene graph).

more interesting data points:
- Karma CPU rendering 1080p: 8m38s vs 27m09s (so it scales pretty linearly)
- Karma XPU rendering 1080p: 3m08s vs 4m48s . Here the interesting thing is that the CPU/GPU usage ratio in the geomsubset one is pretty much 50:50. In the meshes one however, the GPU usage ratio went up to 78%, meaning the CPU massively lost performance, explaining the smaller difference. I'm leaning towards "this is a bug".
Edited by racoonart - July 17, 2025 14:21:25

Attachments:
02_Mesh.png (942.7 KB)
01_GeomSubset_fixed.png (933.4 KB)

User Avatar
Member
258 posts
Joined: Jan. 2015
Offline
Interesting. Looking forward to answer.

If this is not a bug, we need to do some changed to our pipeline.

I have only seen a big difference like this when rendering heavy grooms with Karma XPU. Bezier curves using double amount of time to render compared to subd curves.
Edited by Heileif - July 18, 2025 20:34:21
User Avatar
Member
47 posts
Joined: Feb. 2017
Offline
For better or worse, I currently can't quite reproduce the problem with a synthetic data set. The render stats are a bit odd with meshes vs geomsubsets but the render times are a lot closer: 15s difference in 2min render time. Certainly a pattern there but not as dramatic. Interestingly, the shader calls are 25% higher in the geomsubset than they are in the meshes one - yet the render times are lower.

Unfortunately, it's a bit difficult to debug the original scene as it is so large and just one big usd file but I'll poke around further and see if I can find something. It's definitely worth investigating as xpu and cpu differed so much, and there seems to be a tendency in the synthetic set as well.
Edited by racoonart - July 19, 2025 06:02:03
User Avatar
Member
146 posts
Joined: June 2019
Online
I'm not familiar with Hydra and path tracers but for realtime renderers for example GeomSubsets are definitely preffered
They not carrying transforms and primwars, it's just a proxy with pointers to parent data. Hierarchy depth is usually a performance concern though not that big

it's a bit hard to tell without original scene but may be original one takes advantage of instancing (if max supports it) and then you just break it. Never used Split Primitive but it's sop based, if you flatten layers I can see it can lost instancing and just expand instances (Lop Import traverse proxies so it can handle references to instanceable prims)
User Avatar
Member
47 posts
Joined: Feb. 2017
Offline
I'd generally agree. Instancing probably helps and the original scene has some (not overly much though) and I don't think it's enough (or reasonable) to make a 3:1 difference. Also, I would absolutely understand that editing the usd is an issue with meshes more than with geomsubsets but at render time I don't see why one should be significantly different than the other. At this point all the mesh data is in some acceleration structure and usd "form" should not matter as much? But I'm just guessing there.
There is definitely a difference though with my synthetic set without any instancing in both. Also in the original case, karma cpu and xpu differ massively. If it was a usd issue I would assume that each renderer performs equally good or bad.
What I haven't tested properly is hierarchy depth, thanks for the pointer!

edit: nope, makes no difference

edit2: XPU with gpu-only yields exactly the same render time for both cases in the original scene.
Edited by racoonart - July 19, 2025 10:33:28
User Avatar
Member
146 posts
Joined: June 2019
Online
racoonart
At this point all the mesh data is in some acceleration structure and usd "form" should not matter as much? But I'm just guessing there.

yeah, I don't know
I'm just coming from perspective of realtime and how I translate this prim myself, not sure how hydra and karma communicates with usd

generally, if you have a mesh at /parent/another/Mesh and you have subsets you just need to transform all mesh data (that requires transform like points and normals) via parent->another->Mesh transform stack and subsets just get their part of this data. GeomSubset doesn't have transform its literally just indices to parent data. In realtime it also helps to minimize uploads to GPU, not bandwith which is the same, but exactly how many buffers I need to upload to GPU.

if you have /parent/another/Mesh/SubMesh1 where Mesh is Xform and SubMeshX is Mesh you need to transform via parent->another->Mesh->SubMeshX for every meshes even if sub meshes has identity transform

how this would affect performance I don't know, probably not like 3:1 difference but who knows, transform happens on render time, usd not caching world transform obviously
  • Quick Links