Packed prim 3.3 viewport performance

   2411   3   1
User Avatar
Member
354 posts
Joined: Nov. 2013
Offline
Hi there – this is my first post! This week I've been experimenting a little with the new packed prim stuff, reworking some networks that previously generate several hundred thousand bezier curves. In terms of memory and SOP cook times I'm seeing some fantastic wins. The sticking point however is the 3.3 viewport performance which quickly becomes a bottleneck (GPU is a quadro 4000 btw). In this particular case, the time to render the geometry in the viewport clocks in at 2 or 3 seconds – around 30x greater than the SOP cook time. In fact the performance is actually worse than the time required by the viewer to process unique geometry.

In total I have around 10K points each of which is one of 200 unique packed prims. My (possibly naive) hope was that each unique packed prim would map to a VBO on the GPU (assuming a single material), and the matrix multiplications required to correctly transform the instances would be handled in a vertex shader (aka hardware instancing).

When I run the performance monitor tool, the bulk of time spent falls under the category “Waiting on GPU”. I suspect this means that there are buffers being re-uploaded to the GPU but this is only a guess.

So my question – would it possible to get a detailed description of how packed prims (and even “regular” prims) map to Open GL resources, as well as when and how those resources are created, modified and deleted? Such information would be extremely useful, since armed with a little more knowledge it may be possible to structure and debug our networks to play a little nicer with the current 3.3 viewport implementation.

cheers

Antony
User Avatar
Staff
5286 posts
Joined: July 2005
Offline
It's hard to say exactly what might be the problem without a hip file, but your hardware should be able to easily handle 10K points, even distributed between 200 prims. If you could post a file that would help. Also, is it general tumbling time that takes 2-3s, or update time?

Packed primitives should always take longer to process for display than cook in the SOP, as SOPs do not process the contained geometry, only the opaque primitive itself (1 packed primitive with 1 point). The viewport unpacks the contained geometry for display as the default representation is “Full Geometry”. You can change this to Bounding Box or Centroid with a PackedEdit SOP to avoid this processing, though of course you won't see your actual geometry anymore.

Each packed primitive has its own GL objects. If multiple primitive types are packed inside a packed primitive (say curves and polygons), they are partitioned and have their own GL objects. Materials are external, so all textures and uniform buffers are not contained by the prim. You shouldn't be getting any buffer uploads while tumbling, only when the packed primitive itself changes. If you copy a packed primitive with the Copy SOP, it will be drawn using GL instancing, so the GL vertex arrays stay at the size of a single copy.

“Wait on GPU” is measured around the final swapBuffers() call for the viewport. This is a blocking API call and causes a synchronization point between the GPU and CPU. What a long “Wait on GPU” time means is that GPU hasn't finished rendering all the GL commands sent to it by the CPU.

Hope that helps.
User Avatar
Member
354 posts
Joined: Nov. 2013
Offline
Thank you for the details – very helpful indeed. Apologies for not providing an example file btw, unfortunately the network right now contains several of our own SOPs. I've been trying to get together a similar out-the-box example but not gotten there yet.

Overall I think there may be a few factors at play here. One is that I'm seeing irregular viewport FPS on even very simple scenes. Yesterday I prepared a simple example of 10K points animating up and down using a sine function. The playback was only 40 FPS with the performace monitor reporting “Wait on GPU” as the dominating factor (yep just to draw 10k points). Today however I just opened the same file and now get >120 FPS, which is more inline with our other apps. Very likely I think this is a driver issue on our end, but helped contribute to the confusion leading to my initial post.

Another factor is that, when viewing the full geometry, unpacking for display might be very expensive because my geometry is actually dozens of straight line segments (i.e. 2 point open polygons). In this case the dominating timer is “Geometry setup”. So it seems perhaps each line segment in the packed geometry gets submitted individually, rather than as a single batch of GL_LINES. Does that sound plausible?
User Avatar
Staff
5286 posts
Joined: July 2005
Offline
Multiple curves within the same packed primitive will be collected and draw in large groups. So I don't think that's the case, unless you have small numbers of those segments in each packed prim. Then you'd have 200+ small curve meshes being uploaded each frame.

Not sure about the irregular frame rate. Could have been the driver juggling something else in the background during that session (fragmented VRAM, other apps running, ?). I've occasionally seen small drops, but nothing that large. Hopefully it was just a fluke.
  • Quick Links