Karma - Unexplained increases in memory usage

   2046   13   3
User Avatar
Member
27 posts
Joined: 3月 2023
Offline
Hi,

I've been doing some testing with Karma XPU and most of the time the RAM usage is manageable however sometimes the memory usage spikes so high on both system and VRAM. It causes Optix fail and the GPU not to render at all I suspect.

Feels like some kind of memory leak or something in the scene misbehaving with Karma specifically.
Rendering the same scene in either Arnold or Redshift yields lower memory usage.

Karma XPU:


The logs only eventually say:
KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will be suppressed

The render does start and complete on the CPU.

Example of Arnold GPU memory usage on the same scene. (Shaders, mesh properties, render settings, amount of AOV, etc. matched as best possible)



So we're getting a "74 GB" difference in system memory usage, which is quite significant.

I've also tested the difference between Karma CPU and Arnold CPU.
Karma CPU = 83 GB
Arnold CPU = 26 GB


There is alot of instanced grass geo, which in Arnold for example has x1 Subdiv iteration.
For Karma, I gave it the catmullClark subdiv scheme and lowered the dicing quality to 0.01 to hopefully optimize it.

Karma isn't outputting any logs into the Output Console or anything relevant (like a scene/render report) in the Log Viewer.
Not sure how to provide any additional information needed.


all the best,
amwilkins
-
Houdini Core 20.0.653 - Py3.10
Edited by am_wilkins - 2024年4月3日 11:36:11

Attachments:
karma_xpu_memory.png (356.6 KB)
arnold_gpu_memory.png (316.6 KB)

User Avatar
スタッフ
492 posts
Joined: 5月 2019
Offline
am_wilkins
The logs only eventually say:
KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will be suppressed

Can you please post a screenshot of the error?
Sadly the forum markup can strip chunks of the error message out

am_wilkins
There is alot of instanced grass geo,

What is the geo? curves?
There is a curve memory/speed optimization for curve GPU memory that might help.
KARMA_XPU_OPTIX_CURVE_OPT_LEVEL=1
https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#features [www.sidefx.com]

Are you able to post some memory stats?
There are some in the EXR header (iinfo -v myfile.exr), or you can see them in the viewport too (https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#howto)
User Avatar
Member
147 posts
Joined: 1月 2015
Offline
I just tried a quick test with "KARMA_XPU_OPTIX_CURVE_OPT_LEVEL=1". Memory consumption almost cut in half in my hairy sphere test :O
User Avatar
Member
27 posts
Joined: 3月 2023
Offline
Hi Brians,

That was the only error in relation to VRAM.
R 87| StdErr: [11:15:41] KarmaXPU: device Type:Optix ID:0 has registered a critical error [cudaErrorMemoryAllocation], so will now stop functioning.  Future error messages will be suppressed

I found some logs on a farm render submission:
R161| [11:17:36] Ray Counts: 
R162| [11:17:36] Camera Rays: 265,420,800
R163| [11:17:36] Indirect: 307,690,756
R164| [11:17:36] Light Geometry: 0
R165| [11:17:36] Occlusion: 1,070,167,011
R166| [11:17:36] Probe: 43,006,794
R167| [11:17:36] Total: 1,686,285,361
R168| [11:17:36] Shader Calls:
R169| [11:17:36] Displacement: 201,445,243
R170| [11:17:36] Emission: 0
R171| [11:17:36] Light: 0
R172| [11:17:36] Opacity: 0
R173| [11:17:36] Surface: 0
R174| [11:17:36] Volume: 0
R175| [11:17:36] Total Wall Clock Time: 0:06:43.87
R176| [11:17:36] Total CPU Time: 0:00:07.87
R177| [11:17:36] System CPU Time Only: 0:00:04.17
R178| [11:17:36] Current Memory Usage: 125.77 GiB
R179| [11:17:36] Peak Memory Usage: 125.77 GiB

R 94| [11:10:35] RAT Disk Cache: 56 hits
R 95| [11:10:35] accept_unmipped : 1
R 96| [11:10:35] accept_untiled : 1
R 97| [11:10:35] automip : 0
R 98| [11:10:35] autoscanline : 1
R 99| [11:10:35] autotile : 512
R100| [11:10:35] deduplicate : 1
R101| [11:10:35] failure_retries : 0
R102| [11:10:35] forcefloat : 0
R103| [11:10:35] max_errors_per_file : 100
R104| [11:10:35] max_memory_MB : 63.94 GiB
R105| [11:10:35] max_mip_res : 1,073,741,824
R106| [11:10:35] max_open_files : 1024
R107| [11:10:35] searchpath : ''
R108| [11:10:35] trust_file_extensions : 0
R109| [11:10:35] unassociatedalpha : 0

R126| OpenImageIO ImageCache statistics (shared) ver 2.3.14
R127| Options: max_memory_MB=65473.0 max_open_files=1024 autotile=512
R128| autoscanline=1 automip=0 forcefloat=0 accept_untiled=1
R129| accept_unmipped=1 deduplicate=1 unassociatedalpha=0
R130| failure_retries=0
R131| Images : 63 unique
R132| ImageInputs : 62 created, 6 current, 14 peak
R133| Total pixel data size of all images referenced : 2.6 GB
R134| Total actual file size of all images referenced : 979.9 MB
R135| Pixel data read : 239.7 MB
R136| File I/O time : 12.0s (0.1s average per thread, for 81 threads)
R137| File open time only : 0.2s
R138| Tiles: 472 created, 472 current, 472 peak
R139| total tile requests : 170218036
R140| micro-cache misses : 65841 (0.0386804%)
R141| main cache misses : 472 (0.000277291%)
R142| redundant reads: 0 tiles, 0 B
R143| Peak cache memory : 239.7 MB
R144| 60 not tiled, 62 not MIP-mapped
R145| 1 was constant-valued in all pixels


Memory usage is super high...
I ran a new render test on a 128GB system RAM and a 4090 GPU (24GB VRAM) same issues.


The scene contains a ground, some trees, bushes, rocks etc. All simple shaders.
Lights is just x1 Karma Physical Sky light.

The grass is geometry based "tufts" which is instanced to points via an "instancer" they are quite low poly.






We opted to go with geometry because it's both lighter in the viewport and gives a little more control



thanks,
amwilkins

Attachments:
points.png (1.1 MB)
points_grass.png (1.1 MB)
instancer_grass_02.png (262.0 KB)
instancer_grass.png (127.7 KB)

User Avatar
Member
41 posts
Joined: 2月 2020
Offline
Hey,

We are working on something similiar but using Renderman. We are using a grass that is similar polygon level to your grass above, but as proxy purpose. Then we have a subdivided version which is set as as render purpose. We do not turn on catmull-rom subdivision at render time for instanced geometry like grass. It helps a lot with memory.

Have you tried without subdivision in both Arnold/Karma and compared the RAM usage?
User Avatar
Member
17 posts
Joined: 3月 2019
Offline
Hi!

Note that Arnold with husk has its own issues like this one here:
https://github.com/Autodesk/arnold-usd/issues/1079 [github.com]
Basically it duplicates prototypes in memory many times when they are used on multiple instancers. That can easily multiply memory needs.
For example Sick loads them only once. On some scenes with instancers we saw 3-5x memory difference. 170GB husk, 40GB sick.

Cheers!
User Avatar
Member
27 posts
Joined: 3月 2023
Offline
@kskovbo hi,

Yeah we're also using a proxy purpose but mainly just for view-port performance, super low poly.

kskovbo
Have you tried without subdivision in both Arnold/Karma and compared the RAM usage?
It's a good idea, definitely appears like subdivision on the grass isn't behaving the same way in Karma like I'm used to in other engines.

Ok, here's the test with subdivision disabled on the "Grass" for both Karma and Arnold.

Karma CPU:
R187| [11:42:00]  Current Memory Usage: 62.88 GiB 
R188| [11:42:00] Peak Memory Usage: 62.88 GiB

Arnold CPU:
R 89| [11:49:16]  Current Memory Usage: 17.69 GB
R 90| [11:49:16] Peak Memory Usage: 17.69 GB

Still a 255.46% difference in RAM usuage.
However there are some other assets like trees, bushes, rocks etc. that also have a catmullClark subdiv scheme.
I can try disable subdivision on all assets throughout the scene. Will edit or repost once I do that.

Edit:
Here is with all subdiv disabled on the whole scene:
Karma CPU:
R187| [12:09:58]  Current Memory Usage: 52.63 GiB 
R188| [12:09:58] Peak Memory Usage: 52.63 GiB

Arnold CPU:
R 91| [12:15:27]  Current Memory Usage: 9.34 GB
R 92| [12:15:27] Peak Memory Usage: 9.34 GB

Which is super interesting how Karma is still using so much memory.


@daveborck hello,
Okay, that's good to know. Yeah Arnold really needs to make rendering with Husk more stable and well-tested. I've been running into lots of issues with EXR channel names, AOV, etc. you name it.

I haven't yet seen massive memory usage however, and in my test scene I'm using 3 different instancers. One for background assets, another for grass and the last for pebble, twig, leaf scatters etc.


thanks,
amwilkins
Edited by am_wilkins - 2024年4月8日 06:24:55
User Avatar
Member
27 posts
Joined: 3月 2023
Offline
Hi again,

Managed to increase the verbosity of Husk. Here is some more render stats:
R769| [12:49:31] Object Counts: 
R770| [12:49:31] Cameras: 1
R771| [12:49:31] Coordinate Spaces: 0
R772| [12:49:31] Curve Meshes: 0
R773| [12:49:31] Light Tree: 0
R774| [12:49:31] Lights: 2
R775| [12:49:31] Point Meshes: 0
R776| [12:49:31] Polygon Meshes: 285,891 total 1,627 unique
R777| [12:49:31] Volumes: 0
R778| [12:49:31] Geometry Counts:
R779| [12:49:31] Curves: 0
R780| [12:49:31] Points: 0
R781| [12:49:31] Polygons: 60,271,321 total 14,230,441 unique
R782| [12:49:31] Polygons (Diced): 498,665
R783| [12:49:31] Light Types:
R784| [12:49:31] Cylinder: 0
R785| [12:49:31] Disk: 0
R786| [12:49:31] Distant: 1
R787| [12:49:31] Dome: 1
R788| [12:49:31] Geometry: 0
R789| [12:49:31] Line: 0
R790| [12:49:31] Point: 0
R791| [12:49:31] Rectangle: 0
R792| [12:49:31] Sphere: 0
R793| [12:49:31] Shader Nodes:
R794| [12:49:31] CPU Shaders: 107 total 15 unique
R795| [12:49:31] Function Errors: 0
R796| [12:49:31] Functions Loaded: 234
R797| [12:49:31] Largest Shader: 64
R798| [12:49:31] Shader Nodes: 2,072
R799| [12:49:31] Shaders: 268
R800| [12:49:31] USD Preview Shaders: 2
R801| [12:49:31] Ray Counts:
R802| [12:49:31] Camera Rays: 17,738,472
R803| [12:49:31] Indirect: 255,462,493
R804| [12:49:31] Light Geometry: 693,811,217
R805| [12:49:31] Occlusion: 560,367,545
R806| [12:49:31] Probe: 7,176,314
R807| [12:49:31] Total: 1,534,556,041
R808| [12:49:31] Shader Calls:
R809| [12:49:31] Displacement: 188,534,307
R810| [12:49:31] Emission: 0
R811| [12:49:31] Light: 1,048,666,934
R812| [12:49:31] Opacity: 416,409,419
R813| [12:49:31] Surface: 219,069,896
R814| [12:49:31] Volume: 0
R815| [12:49:31] Primvar Cache: 35,388 hits, 12,416 misses
R816| [12:49:31] Primvar Memory Usage Actual Uncompressed
R817| [12:49:31] real32[3] <dicedmesh> 9.42 GiB 9.43 GiB
R818| [12:49:31] int32 <topology> 1.99 GiB 2.25 GiB
R819| [12:49:31] real32[3] N 63.30 MiB 598.36 MiB
R820| [12:49:31] real32[3] Pref 18.25 MiB 166.57 MiB
R821| [12:49:31] real32[3] P 18.25 MiB 166.44 MiB
R822| [12:49:31] int32 QuadVerts 8.23 MiB 115.07 MiB
R823| [12:49:31] int32 TriVerts 824.30 KiB 11.25 MiB
R824| [12:49:31] real32[2] st 594.35 KiB 3.34 MiB

R866| [12:49:31] Bucket Time Breakdown:
R867| [12:49:31] Category Time Percentage
R868| [12:49:31] Shadows 3:02:04.02 47.68
R869| [12:49:31] Indirect rays 1:18:26.05 20.54
R870| [12:49:31] Shading 1:12:19.54 18.94
R871| [12:49:31] Lighting 0:31:22.34 8.22
R872| [12:49:31] Unaccounted 0:13:07.75 3.44
R873| [12:49:31] Primary rays 0:02:39.34 0.70
R874| [12:49:31] SSS samples 0:01:11.89 0.31
R875| [12:49:31] Filtering 0:00:27.20 0.12
R876| [12:49:31] Dicing 0:00:14.86 0.06
R877| [12:49:31] Total Wall Clock Time: 0:03:30.50
R878| [12:49:31] Total CPU Time: 0:00:13.02
R879| [12:49:31] System CPU Time Only: 0:00:09.27
R880| [12:49:31] Current Memory Usage: 52.59 GiB
R881| [12:49:31] Peak Memory Usage: 52.59 GiB

Unfortunately, not the greats breakdown of memory usage. I'm not seeing anything here that could be cause.


amwilkins
User Avatar
スタッフ
492 posts
Joined: 5月 2019
Offline
We are aware that KarmaXPU is using a lot more CPU ram than KarmaCPU.
We are working to reduce the CPU ram usage of KarmaXPU for the next major release of Houdini, so keep an eye out for that.

Regarding Arnold vs KarmaCPU, I'll be sure to pass this onto the KarmaCPU team to have a look/think about.

thanks
User Avatar
Member
27 posts
Joined: 3月 2023
Offline
Hey brians,

Thanks very much for the info, appreciated.
We'll continue to do testing and I'll update here if anything else of interest is noted.

Otherwise, will stand by for any new Karma updates.


Best of luck,
amwilkins
User Avatar
Member
27 posts
Joined: 3月 2023
Offline
Hi,

Just an update for those who were following this or might find in the future.

I believe I might have solved this on my other thread:
https://www.sidefx.com/forum/topic/95336/?page=1#post-419237 [www.sidefx.com]

TLDR:
A specific Tree asset which was causing huge memory spikes as the camera moved closer.
Appeared to be in relation to "Dicing Quality" which of course would affect Karma and not other engines.
Lowering this value reduced the memory usage.



all the best,
amwilkins
User Avatar
Member
31 posts
Joined: 6月 2010
Offline
Tested scene which we rendered in Houdimi 19. Only terrain with around 20 mil polys. In h19 it would render it with subdivision on. On 64 gb ram machine.

In H20 same scene would fill out 128gb of ram and crash Houdini.

Both karma CPU
User Avatar
Member
31 posts
Joined: 6月 2010
Offline
Here's comparison. Same scene in H19 and H20. One terrain mesh has subdivision turned on.



H20 eventually crashed.

Attachments:
H19_memory usage.PNG (1.6 MB)
H20_memory usage.PNG (36.3 KB)

User Avatar
Member
31 posts
Joined: 6月 2010
Offline
Here's Houdini 19, without and with subdivision. Just to check it is working:

Attachments:
H19_no_subdiv.PNG (3.5 MB)
H19_subdiv.PNG (3.7 MB)

  • Quick Links