Best way to write out metadata together with geo?

   284   5   1
User Avatar
Member
254 posts
Joined: Nov. 2007
Offline
Hi,

I am trying to find out what the best way is to write out meta-data into a json file whilst I am processing a bunch of geometry.

I have attached a file to show some of the different ways I am trying to get this meta data out.

In this example I have different polygonal tiles that I want to process (think terrain).
Ultimately each tile will end up subdivided with color data and a custom (single float) mask. This color data is then brought into COPs and written out as a texture (png).
Besides writing out the texture, I also want to write out a json file that contains information about the tile. Specifically the min and max values of the custom mask.

The goal:
To write out that metadata file at the same time when the texture from cops is cooked. -- This is important as the metadata file is 'light' to write and 'light' to process, but the cooking of the image data would take a while. So I want this to happen during the same process (during the same work-item?).

What I have tried so far:
What works:
*) I created a python sop that writes the metadata file whenever this sop cooks. If this sop is injected right before the node that goes to COPs is called, then it will correctly bake out the json metadata. -- Although this works, this seems a bit hacky as it is not really part of the tops chain. -- And the metadata will get exported whenever the python sop cooks (also whilst debugging the sop network).

In this case the dependency looks like:
TOP Fetch -> fetches COP rop -> fetches sop node (will trigger sop upstream cooking - computationally expensive step) -> triggers python sop cook (writes json - computationally cheap step) -> continues (downstream) to COPs to write image data.


*) I used the labs csv baker tool to bake the metadata as csv. In this case the python code lives on a digital asset and the definition is called through a shell rop inside of the csv baker rop. This does seem to correctly update the geometry so the metadata is updated correctly. ( I can build my own json baker tool similar to the labs csv exporter tool if this would be a good approach)

In this case the dependency looks like:
TOP Fetch -> Fetches COP rop & write image data -> fetches sop (will trigger upstream cooking - computationally expensive step)
TOP Fetch -> fetches the Csv exporter rop -> fetches sop (will trigger upstream cooking - computationally expensive step) -> continue downstream to csv exporter & write metadata (computationally cheap step).

My concern with this is that the 'computationally expensive step' is executed twice. Once for the writing of the image data and again for the writing of the metadata. Ideally I want to somehow link these two outputs to be 'run as one' or 'run after each other but during the same process/work-item'. I don't know if this is what partitioning is supposed to do? Basically grouping work items together that are supposed to run together. -- Similar to how the frames of a simulation are run as a batch. I would like the work items for each tile to run together.

I do like that the json exporting functionality would be wrapped into its own rop as that seems clean and also creates work items for each json file.

*) What does not work:
I tried using the pythonscript top. But because this is updating 'in-process' it will update the tops work-item, but it will not actually update the geometry in the scene and therefore it would write the metadata for whichever work-item was last selected. The metadata currently contains the min/max values of the custom mask - this is done using a sop attribpromote, which requires the geometry to be correctly updated.

I would almost be tempted to make a new 'python script geo' top hda that wraps a rop net with a shell rop that grabs the callback definition from a string panel. This seems a bit much and this is also when I am thinking there must be an easier/better way to have the existing pythonscript top correctly update the geometry that is triggered or sampled. Perhaps I should try to force a dirty & cook on a portion of my sop network so the data gets updated?

Any suggestions or advice as to what is the best way forward would be greatly appreciated.

The main two questions are:
1) How can I trigger the work-items from two different tops so they run during the same process? (partitions?)
2) How can/should I use the pythonscript top so it updates the geometry correctly so I can pull the metadata from the geo correctly.

Thanks!

Attachments:
pdg_meta_tile_v001.zip (61.2 KB)

Cg Supervisor | Effects Supervisor | Expert Technical Artist at Infinity Ward
https://www.linkedin.com/in/peter-claes-10a4854/ [www.linkedin.com]
User Avatar
Staff
449 posts
Joined: May 2014
Offline
Regarding 1), it's not currently possible to do that. A partitioner node is used to wait for multiple independent tasks to complete, however it doesn't combine the tasks into the same physical worker process. Each of the tasks in the partition may cook at any point time/any order, and the partition itself will be marked as cooked once of all of its dependencies are cooked.

Each work item has its own distinct attribute data and a command line that gets executed when that work items cooks. Batches (like in the ROP Fetch) are a special kind of work item that can report progress at finer-grained intervals, but under the hood its actually a single work item/physical process on the farm. It just happens to report a per-frame output file/cook status which PDG and TOPs will represent as individual work items.

However, It is possible to point a ROP Fetch TOP node at a ROP network instead of a single ROP. Each work item in the ROP Fetch TOP will cook the target ROP Network, and outputs are collected from each ROP in the network and reported on that work item. You can use that to e.g. cook a simulation, and write out some additional geo along side the the sim output. The ROP network could also consist of a Composite ROP and a Geometry ROP, which would both be cooked as part of the same work item. If the COP network is referencing the same SOP network used by the Geometry ROP, then the SOP portion should only cook once. That would be the recommended way to set up multiple outputs now, since TOPs doesn't have a way to combine multiple tasks into a single physical job.

I think that may be what you need for your use case -- I'll try to set up an example of that some time this evening.

For 2) The Python Script TOP can run in one of two modes. It can run in process, in which case the work items runs in the background as part of the regular TOP graph evaluation. Or it can run out of process, in which case each work item is scheduled with whatever scheduler is being used and runs as standalone .py script in own standalone process.

When it's running out of process, you could load a .hip file and perform whatever node cooks you want using HOM. That's effectively what the ROP Fetch TOP is doing, using the $HHP/pdgjob/rop.py script.

However, when the script runs in process it's not allowed to do write operation on the scene since it's running in the background. It can read/write attributes on the work_item associated with the script, but it can't e.g. trigger cooks or edit parameters.
Edited by tpetrick - Sept. 9, 2021 18:53:58
User Avatar
Member
254 posts
Joined: Nov. 2007
Offline
Thank you for your explanations!

For 1)
The clarification in regards to writing out multiple outputs during one work-item by fetching a rop network consisting of multiple rop nodes makes sense and I will try it. An example of this would be very much appreciated. Thank you.
If this also makes it so the cook of the heavy geo only happens once (per tile), then this is probably the approach I will use going forward. That seems to be the best approach forward for this task.

The analogy with simulation data sort of makes sense as well then, but makes me curious if that would A) write the sim data first (for all frames) and then write the additional output geo (for all frames - potentially recooking the sim?). Or if B) the sim data is written out frame by frame and after each sim frame finishes, the additional output geo writes (frame by frame) thereby avoiding recooking.

Basically:
work-item1: Sim all frames (rop) -> write additional output for all frames (rop)
vs
work-item1: Sim a frame (rop) -> write additional output for a frame (rop) -> go to next frame

Is this dependent on how the rops are wired perhaps? (in sequence, vs in parallel merged?) -- Normally I think rops don't tend to do frame-by-frame dependencies. They tend to run the entire frame-range before moving on to the next rop. Maybe I'm missing something here?


I see now in regards to partitions that they are behaving more like groups of work-items that are visually packed together, but under the hood still execute individually.


For 2)
This makes some sense, but also massively reduces the usefulness of tthe pythonscript top. I guess its' purpose is more to directly manipulate the top attributes on the work-items instead of trying to pull (changing) geometry data from the scene.
Cg Supervisor | Effects Supervisor | Expert Technical Artist at Infinity Ward
https://www.linkedin.com/in/peter-claes-10a4854/ [www.linkedin.com]
User Avatar
Member
5635 posts
Joined: July 2007
Offline
maybe you can also try running your json gen script as pre or post frame script on your COP ROP
Tomas Slancik
FX Supervisor
Method Studios, NY
User Avatar
Staff
449 posts
Joined: May 2014
Offline
I've attach a simple example that writes out a terrain geo and heightfield using COPs as part of the same work item. For illustrative purposes, I also included a Python SOP in the chain that prints frame number when it cooks. In the work item log there'll be a single print out for each frame, followed by PDG reporting both the geo and cop output for that frame. Each work item in the node ends up with two output files (one for each ROP in the chain). Batching is also enabled.

One limitation right now is that the "Output Parm Name" option on the ROP Fetch only accepts a single value. PDG uses that parameter to determine which parm on the target ROP node defines the output file path, so it can evaluate that path for cache checking and reporting cooked results. It has a list of parameter names that it knows about internally, but when using custom ROPs with their own output parm naming convention, it's necessary to explicitly inform PDG of the output parm name.

That's not an issue if you're using built-in ROPs since PDG knows about the output file parms on all of the standard ROP nodes, but if the ROP network consists of multiple, custom ROPs with unique output path parms, the work item likely won't be able to report all of the outputs properly. That's easy enough to fix on our end however, so that the parm can accept a space-separated list of output parms. I noticed in your .hip file that you're using the Labs CSV exporter so I think you'll need that fix. I should be able to get that in by early next week.

Regarding the question about cooking order -- ROPs can cook either frame by frame or node by node. The ROP Fetch TOP exposes a parameter to configure which cooking behavior is used when the work items evaluate the target ROP network. That setting only really matters if you're cooking a batch, since otherwise each work item will only cook one frame and both options will behave the same.
Edited by tpetrick - Sept. 10, 2021 12:06:12

Attachments:
sopcopfetch.hip (141.5 KB)

User Avatar
Member
254 posts
Joined: Nov. 2007
Offline
Hey Tomas,

Thanks for the suggestion. If you look at what the csv exporter rop does, it calls a python definition from the pre-render script section of that shell rop. So that should work.
I am trying to 'stick with tops' and was trying to figure out if what I wanted to do would be possible with tops only.
Turns out that it is not possible with only tops (currently). So a chain of rop nodes seems like a clean solution.


Thank you tpetrick for the example file. I see the 'frame by frame' parameter on the ROP fetch TOP.
And the info of the 'Output Parm Name' makes sense as well. When you mention 'so it can evaluate that path for cache checking', is that also what is used for 'Delete This Node's Results from Disk'?
Cg Supervisor | Effects Supervisor | Expert Technical Artist at Infinity Ward
https://www.linkedin.com/in/peter-claes-10a4854/ [www.linkedin.com]
  • Quick Links