Rendering 3 million polygons
The following are suggestions on how to improve rendering of a large number of polygons.
Spatial subdivision efficiency
Spatially subdividing your geometry works very well if all of the objects have non-overlapping bounding boxes, as it allows each spatial voxel to work on a small subset of objects. However, when many objects have overlapping polygons, there is no way to sub-divide the bounding boxes to limit the number of primitives tested.
Displacement bounds do not shrink correspondingly when you split a primitive into two primitives. Therefore, displacement bounds will result in overlapping primitives, which means that each ray has to do more work.
The following are ways of dealing with this problem to make ray-tracing more efficient.
Turn off True Displacements and True Ray-Displacements
This treats displacements like bump maps, which will avoid the creation of new displaced geometry. For small displacements, the rendered result can be very similar to true displacements. Turning off True Displacements will cause mantra to ignore the displacement bound parameter.
Use displacement shading for detail
If you have a displacement bound set to 0.1 and your polygons are 0.01 units in size, you will end up with potentially between 100 and 1000 with overlapping bounding boxes. By keeping displacement bounds small, you can minimize the overlap.
Turning on ray-predicing will cause all displacement shaders to be run before rendering begins. This allows for optimal subdivision structures to be built since mantra already knows the exact position of displaced polygons when rendering begins.
Unfortunately, using this option can balloon memory usage, since dicing is normally resolution-dependent (i.e. larger images will use more memory).
Use perfect bounding boxes
Similar to predicing, this will run the displacement shaders before the rendering begins when the building the spatial subdivision structures, resulting in efficient ray-tracing. However, it will not keep the diced polygons around afterward. The memory usage will be better, but geometry will have to be regenerated when rendering is performed.
Spatially separate objects
Similar to spatial subdivision, the idea is to have objects separated in space so that their bounding boxes do not overlap.
For example, a scene containing buildings on a grid will typically have each building object spatially separated from the other buildings, whereas a scene with a bowl of fruit where each piece of fruit is an separate will typically have lots of overlapping objects. If you were to send a shadow ray from the table through the bowl of fruit, you would end up having to test all the pieces of fruit, since their bounding boxes overlap.
You could try merging all of the objects into a single object so that mantra can do better optimization on the primitives in the single object.
You could also try ray predicing. When objects are included in ray-predicing, they are all processed before any rays are sent. All pre-diced objects are automatically put into a single object by mantra. This can make ray-tracing significantly faster, at the cost of memory consumption.
Limit the amount of work for mantra
When displacement shading, mantra will cache the geometry in case it needs to be traced against again. However, mantra cannot keep all of the geometry in memory at once so it throws geometry out of the cache. Mantra defaults to keeping approximately 32 MB of displaced geometry in RAM at one time. By increasing the size of the geometry cache, mantra will do a lot less work re-dicing displaced geometry.
If you have a lot of RAM, use ray predicing since it forces all of the geometry to be cached for the duration. Otherwise, changing the cache size to something reasonable for your system.
Optimizing spatial subdivision structures makes rays move faster through objects. Rays are more efficient because they have to do less work when looking for objects.
Another way to make rays more efficient is by simplifying the objects they intersect. This can be done by limiting the intersection scope of the rays or by using proxy geometry from ray-tracing.
Choosing a scope of objects to intersect against allows mantra to build slightly more efficient high-level subdivision structures. Therefore, if a ray is not going near the intersecting object, mantra can quickly cull that ray.
Max ray distance
Sometimes you can use the maximum ray distance to limit the scope of objects being intersected against. This is a parameter on the VEX functions and has to be written into shaders.
When performing ambient occlusion, for example, you might only want to check against objects which are relatively close to the surface you are shading.
By using phantom objects, combined with scoping, you can “fake” objects. For example, rather than a bowl of fruit, put a card with a texture map in as a phantom object. Primary rays will still hit the real bowl of fruit, but refractions through the glass of wine will hit the single polygon card, which is a lot more efficient. Phantom is only supported at the object level.
Adjusting the shading quality
The idea here is to make slightly larger primitives when ray-tracing, resulting in fewer primitives to ray-trace against. To control shading quality, adjust the Rays Shading Quality parameter.
The ray-measurer can also be used to tweak this behavior. Adjusting the z-importance will cause more or fewer divisions in the Z direction. Fewer would be more efficient.
Send fewer rays
Aside from making rays faster, the only other way to speed up ray-tracing is to send fewer rays.
Using shadow maps instead of raytraced shadows is particularly important when volume rendering.
Each volume is evaluated N times for each sample when generating a shadow map. When performing shadow evaluation, it is a texture map lookup (which may or may not be more expensive than a volume evaluation).
Each pixel in the main image has M volume evaluations when performing ray-traced shadows. However, each of those volume evaluations has N (or some fraction of N) evaluations for shadow evaluation. This is M*N operations.
Non-volume rendering can also benefit from shadow maps, since typically sending a ray is more expensive than performing a map lookup.
Reflection Maps are similar to shadow maps, but are typically only useful for environment lighting since there are unreconcilable issues with local illumination and environment maps.
If limiting the ray-bounces is too tricky, consider changing the ray weight parameter. If shaders are written correctly, the expected contribution of each ray will be available and only rays which will contribute more than the weight specified will be traced.
Environment map lighting (HDRI Lighting)
Use the environment light object rather than writing your own shader. The environment map light will perform an analysis on the texture map and optimize sampling of the environment.
Rendering 300 million polygons
Rendering 3 million polygons is an exercise in performing ray-tracing efficiently, whereas rendering 300 million polygons on a 32 bit system is about how to physically do this since there is no way to physically store 300 million polygons in RAM at one time (assuming that a polygon takes more than 10 bytes of storage). Although a 64 bit operating system does not have the memory constraints that a 32 but system has, minimizing memory usage can improve efficiency on a 64 bit system as well.
The assumption is that the 300 million polygons are split among several different objects in Houdini. There is no way for mantra to hold all of these objects in memory at one time, so paging needs to be performed. That is, mantra needs to be able to load objects when it needs them, and to be able to discard them when it no longer needs them.
The only way to do this in mantra is to use procedural geometry. The procedural shader acts as a place-holder for the actual geometry. The procedural generates further geometry when its bounding box is hit. When the generated geometry has been rendered, the generated geometry can be deleted.
However, there are some things which mantra can’t throw away.
Anything specified in the IFD
Since the IFD is not persistent, the contents of the IFD are retained by mantra. So, any inline geometry will stick around for ever. When procedurals are used, ensure that the Force Geometry toggle is turned off otherwise display geometry will be sent down in the IFD, bloating memory.
When a shader sends a ray, the ray can start from anywhere and go to anywhere. This means that it is impossible for mantra to predict if a ray is going to intersect geometry at some future time. At the current time, if a ray ever hits a procedural, the procedural is flagged as being ray-traced and its geometry is retained. This means that you can not ray-trace 300 million polygons unless you have a 64 bit machine. It might be possible to render smaller scenes, but mileage may vary.
Mantra offers a few procedurals out of the box, with the ability to write custom procedural geometry using the HDK. It should be noted that unless you specify a bounding box for the procedural, mantra has no idea how big the geometry is and may run the procedural as if it were specified in the IFD. For all intents and purposes, this defeats the benefits of the procedural.
The File Procedural
The file procedural (delayed geometry load), will load geometry from disk on demand. You must specify the bounding box for the file, otherwise the procedural will load the geometry during IFD processing. This defeats the purpose of the procedural.
The file procedural has two modes of operation. When Share Geometry is toggled on, the geometry loaded will be shared amongst other instances of this procedural. This is useful when you have one piece of heavy geometry that’s instanced many times. Rather than loading the geometry multiple times, mantra will load the geometry once and share the geometry amongst all procedurals.
However, this means that mantra needs to hold onto the geometry for the duration of the render. The alternative (when Share Geometry is turned off), is that mantra will load the geometry for the procedural on demand, and then free the geometry after the fact. This can have a large impact on the rendering footprint.
The Program Procedural
It is possible to have mantra run an external program to generate geometry. One advantage of this approach is that the program can perform differently based on the visual level of detail of the procedural. The program string is scanned for %LOD which is then replaced with the level of detail of the object.
Your program could be a simple shell script which uses different geometry based on the level of detail in screen space. For example, if you have a 1 million polygon spaceship model which is crammed into 4 pixels of screen space, you might consider using a lower polygon count model in this case.
Mantra processes geometry by breaking up primitives into smaller primitives, until the primitives are “small” enough to render smoothly. These split primitives are stored in caches until they have been processed. If they get flushed out of the cache, they will be regenerated.
When mantra renders a tile, the primitives in the tile are all processed, split up, and the results are put in the cache. If split primitives are entirely inside the tile boundary, they can be discarded after the tile is finished rendering. If the primitives cross tile boundaries, they have to be kept around until the other tile they cover is rendered.
This usually is not a problem, except when motion blur or depth of field is involved. In this case, the bounding boxes of primitives are expanded to include the motion or depth of field bounds, which means that primitives can occupy a lot more screen real-estate.
However, since a procedural is considered a primitive, it should be noted that until the bounding box of the procedural is completely rendered, the geometry of the procedural is retained in memory.
Approaches to minimizing the cached memory include:
Rendering in strips and compositing the resulting image
Using the camera crop channels, you can render a strip of the image. This might be a column or row, but the idea is to minimize the amount of geometry mantra keeps in cache.
The problem that this attempts to solve is that when the first tile of a scanline gets rendered, it may have some primitives which are retained until the tile above it gets rendered. For a very large x-resolution, this can result in the overlapping primitives being held for a long time. By breaking the image into vertical strips, the memory is not held on as long.
What may be surprising is that a larger bucket size may result in a smaller memory footprint. Since the larger bucket will have fewer overlapping primitives, there may be more memory which can be freed after the tile finishes.
Rendering in layers
Instead of splitting the image into strips in screen space (rendering strips), it may also be possible to split the image spatially. By breaking up the scene into a sub-set of objects (say background and foreground objects), it may be possible to render these sub-sets with better memory performance. The images can then be composited together to form the final image.
As mentioned in the Memory Retention section, ray-tracing may cause rendering issues. However, there are ways to minimize the penalty. If you need to use ray-tracing, you can use proxy geometry. Using the phantom channels allows objects to appear in ray-tracing, but are not visible from the camera. This allows proxy geometry to be used providing the actual geometry is turned off for ray-tracing.
The term Depth Complexity refers to how many primitives are stacked in Z at a particular point in screen space. For example, a single polygon in the scene would have a depth complexity of 1 (or 0 for pixels that the polygon doesn’t cover). A box will have a depth complexity of 2 (one for the front surface, and one for the back surface). The greater the depth complexity, the greater the processing and memory use.
However, objects which are entirely occluded by foreground primitives can be culled by mantra, so that mantra can minimize work and memory when objects are fully occluded. However, the only way mantra knows whether a primitive is occluded is by evaluating its bounding box.
So, if you think of two circles which are right behind each other, as a human you would be able to say that the background circle wouldn’t need any processing. However, mantra only knows about the bounding box of the background circle. Since the foreground circle does not occlude the entire bounding box, the background circle needs to be processed.
In many cases, this will result in the background primitive being split and some of the split primitives being discarded, but it is still possible to end up with additional processing.
Depth complexity can become a serious issue when it come to transparency. Consider a stack of semi-transparent sprites. Each sprite is 50% opaque, but there are 1000’s of sprites stacked into a single pixel. The compositing over operation states that the contribution of the 2nd sprite will only be 0.5, the third sprite 0.25, the fourth 0.125, etc. So, by the 10th sprite, it will only be contributing 0.0001 to the pixel color. The opacity threshold (
vm_opacitythresh) can be used to specify a threshold at which the accumulated opacity is considered complete. For example, setting the threshold to 0.99 would limit the number of sprites processed in the above example.
For each tile, mantra needs to store a certain amount of information for each sample of each pixel. Therefore, the memory used is a function of: depth complexity, pixel samples, and bucket size.