ROP Fetch: Distribution options as a separate node

Forums PDG/TOPs ROP Fetch: Distribution options as a separate node

6940 32 3


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

May 20, 2019 12:10 a.m.

HI,

Can you please explain why you integrated distribution option to ROP Fetch node? I'm asking because it slightly confusing me that it is a function of fetching.
It doesn't make sense for all other ROPs (except geometry) to have this option.

I remember that in first builds of Houdini 17.5 it was as a separate node.
For me, it makes sense as a custom wedge node.

Thanks
Ostap


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

May 23, 2019 3:04 a.m.

Is it possible to take over control on slice workitems? To have the possibility first prepare slice workitems and then deliver them to ROPFecth (to make it procedural)? Because looks like, slices behave just like wedges. Maybe it will be more PDG way?

Edited by Ostap - May 27, 2019 00:59:18


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

May 23, 2019 3:22 a.m.

And when dist. sim. is on (ROPFetch) then ‘Evaluate Using: Single Frame’ doesn't work. Is any reason why?
Houdini build: 17.5.242


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

May 27, 2019 12:58 a.m.

Hi,

Is it possible to take a look at this issue?
We are not sure how to work with it.

Thanks
Ostap


tpetrick: Staff; 585 posts; Joined: May 2014; Offline

May 27, 2019 10:51 a.m.

The distributed sim options are on the ROP because they can currently only be used by work items generate by the ROP fetch node (the ROP Geometry is a ROP Fetch node internally as well). The rop.py wrapper script for evaluating ROP work items is what makes use of the dist sim settings. It should be possible to make a standalone node for preparing that information, since they're just attributes - I've logged an RFE for that suggestion.

When distributed sims are enabled, the ROP Fetch also generates a sim tracker begin/end work item, makes the sim slices depend on the begin tracker item, and makes the end tracker item depend on each of the slice simulations. Because of this the node needs to be able to generate the begin item, end item and simulations all in one go. It can't easily generate the items one at a time, based on upstream work items.

Can you attach a file demonstrating the issue you're having?


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

May 28, 2019 4:16 a.m.

Thanks for the explanation.

I thought that make separated simtracker and ROPFetch node will make logic slightly ease and in case (of using distributed option) will not change ROPFetch node behavior so dramatically.

Question is still open - how to make procedural dependencies between distributed nodes?
Can you please take a look at this example?

Edited by Ostap - May 28, 2019 04:26:48

Attachments:
pdg_distribute_01.hip (185.4 KB)


tpetrick: Staff; 585 posts; Joined: May 2014; Offline

May 28, 2019 5:40 p.m.

The behavior of the second network with two dist sims looks like a bug - I'll look into it and have a fix for you this week.


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

May 29, 2019 5:20 a.m.

Would be super nice if distributed slices can source workitem from the wedge. In this case, we can predefine some additional parameters (like output file per slice) more procedural.


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

June 1, 2019 5:40 a.m.

We all know Houdini for freedom of customization and combination and looks like PDG developing in the same way.

So if you allowed, I would like to show you one of the possible options for simtraker and ROPFetch combination (hip file in attachment). In this example, logic is divided into the smaller block and what is most important for us - is geometry path output could be inherited from upstream workitem (in our case, each slice has a completely different path). simtraker - is not a part of geometry so I removed it out from the main dependencies scope (to make geometry dependencies more cleaner)

I will be appreciated for your review.

Attachments:
pdg_dist_fecth.hip (178.0 KB)


tpetrick: Staff; 585 posts; Joined: May 2014; Offline

June 4, 2019 5:46 p.m.

As of a few days ago, the behavior of “Evaluating Using” when used with the distributed sim options should be fixed. It now behaves the same as regular batches.

Additionally, the set up you described in your last post can more or less be done out of the box already. I've attached an example that should work in any recent build of 17.5. There are some sticky notes in it already, but I'll touch on a few key points here.

The first is that the tracker is spawned using a Generic Generator. I used PDG's shared server API to do that, rather than running Houdini's sim tracker directly. The shared server API (used for sim trackers, command chains, etc) ensures that the tracker is killed when the graph finishes cooking regardless of whether or not the cook succeeded or not. The shared server API also handles spawning the tracker with an available port, reporting it's port/ip as result data, and then returning while the tracker runs in the background. This is effectively the same thing that's done with the built-in sim tracker, but it can also be done manually using a generic generator to run the appropriate commandline.

Before partitioning the slice options and tracker, I first sort the slice options so that all of the slice 0 work items are first, then all of the slice 1 items, etc. This is because the batches in PDG must be contiguous, and the ROP Fetch expects to generate batches directly from the upstream items. There's probably an RFE here to expose more options on the batch generation such a stride/offset, which would eliminate the need for the sort node.

Attributes for configuring the slices are passed in using the @ syntax, i.e. the control dop's Tracker parameter is set to the expression “@trackeraddress”. You could also use the push-style wedging option as well, by specifying a target paramete for the various attributes on the wedge node. The distributed sim options on the ROP Fetch itself are not used at all in the example.

Also, the file has a distributed sim that runs 4 slices, meaning it'll need to be able to run 4 jobs at a time. I have the scheduler set to “1/4 CPU count”, but you may need to adjust that setting based on your system si that PDG will run at least 4 concurrent jobs.

Edited by tpetrick - June 4, 2019 18:27:42

Attachments:
distributed_flip.hip (3.4 MB)


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

June 5, 2019 12:56 a.m.

Ohh… That is a really cool example! Especially I like this trick with tracker/sharedserver!
But as usual, I have a question: Where/Why you are using slicetype and control attrs (from Wedge node)?


tpetrick: Staff; 585 posts; Joined: May 2014; Offline

June 5, 2019 10:30 a.m.

Ah, sorry those don't need to be there/aren't used - I forgot to remove them before uploading the file.


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

June 6, 2019 2:28 a.m.

Maybe I'm doing something wrong but tracker doesn't work for me : (

Attribute Create doesn't work when you run the whole chain (values just empty after Generic Generator). If I run manually node by node then values passing.

When I run your scene as it is tracker (Generic Generator) raising error:

.../tracker.py, line 31, in <module> from pdgjob.pdgcmd import pdgjob.pdgcmd reportResultData
ImportError: No module named pdgjob.pdgcmd

When I'm pointing directly to the tracker.py file then
tracker (Generic Generator) creating tracker/webpage address but this address is empty.

Why you are copying tracker.py script to temp PDG_SCRIPTDIR? Why not run it from directly from a $HFS folder? Ca you please point me to the mechanism which is knowing what has to be copied to PDG_SCRIPTDIR?

Houdini build 15.5.277

Can you help, please.


tpetrick: Staff; 585 posts; Joined: May 2014; Offline

June 6, 2019 10:41 a.m.

The reason it's run from PDG_SCRIPTDIR is that the job doesn't need HFS and does not run under Hython. All of the scripts that are needed are standalone scripts that have no dependency on a Houdini installation. They need to be copied because the job may be submitted to the farm. The scripts are normally copied automatically because the RopFetch node lists them as dependencies in its template file.

You'll need to add the various scripts as file dependencies to the generic generator so they're moved to PDG_SCRIPTDIR prior to cookign the tracker work item:

$HFS/houdini/python2.7libs/pdgjob/pdgcmd.py
$HFS/houdini/python2.7libs/pdgjob/sharedserver.py
$HFS/houdini/python2.7libs/simtracker.py
$HFS/houdini/python2.7libs/pdg/job/tracker.py


tpetrick: Staff; 585 posts; Joined: May 2014; Offline

June 6, 2019 10:44 a.m.

You can also change the command line on the generic generator to run from HFS, or use Hython:

$HFS/bin/hython “$HFS/houdini/python2.7libs/pdgjob/sharedserver.py” –start –name simtracker –port 0 –timeout 15 –proto_type “raw” $HFS/bin/hython “$HFS/houdini/python2.7libs/pdg/job/tracker.py” {port} 0


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

June 6, 2019 5:37 p.m.

But even if I pointing to $HFS directory and the task is a success (state: Cooked) with the resulting output: http://mycomputer:36329
Even then tracker didn't start (by this web address - http://mycomputer:36329)


tpetrick: Staff; 585 posts; Joined: May 2014; Offline

June 6, 2019 6:07 p.m.

Once the PDG graph is finished cooking, the shared server API will shutdown the tracker. So if you cook just the tracker node it will start, the graph will finish, and the tracker will be shut down. Is that possibly what's happening in your case?

One of the reasons we didn't implement distributed simulations with a separate tracker node is this exact problem. If the tracker is left running even after the graph stops cooking, it's unclear when it should be cleaned up. On the other hand, it also means there would need to be some mechanism for the ROP fetch with the distributed sim to request that the tracker, which is in a different upstream node, be restarted each cook.

Edited by tpetrick - June 6, 2019 18:18:09


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

June 7, 2019 4:54 a.m.

oh.. now it is clear. It wasn't intuitive that tracker doesn't work anymore. So how about to make it intuitive: if workitem state is cooked then tracker is alive and some another workitem has to stop tracker or when workitem is dirty (then tracker stops too)?

Anyway, “attribute create” doesn't work because it doesn't wait for values above.

Let's take a look on our issue again - would be nice to have the possibility to define workitem slices before ROPFetche (because we have some individual data for each slice) and actually, you already doing it in your example. So the question is to find completely another approach for it or make (finalise) simtracker outside of ROPFetch node?

Looking forward to your reply.


Ostap: Member; 209 posts; Joined: Nov. 2010; Offline

June 12, 2019 6:24 p.m.

Hi,

Any news about this issue?


tpetrick: Staff; 585 posts; Joined: May 2014; Offline

June 12, 2019 6:35 p.m.

Hey, sorry for the delayed response.

Ostap
So how about to make it intuitive: if workitem state is cooked then tracker is alive and some another workitem has to stop tracker or when workitem is dirty (then tracker stops too)?

This isn't currently possible with the way PDG works. There's an assumption that work spawned during the cook is cleaned up when the cook finishes, and there's also no way for a dirtying a work item to kill a long running process. We're discussing internally if this might be something we can implement, but since it would require some sigficiant changes I don't really have a time line for when it would be available.

Another considerations is how that sort of workflow would be handled when loading saved task graph state from disk, which would could contain a tracker work item in the cooked state. The tracker process wouldn't exist though, which conflicts with the assumption that a cooked tracker means the process exists. There would also need to be a mechanism for certain types of work items to reject the state specified in a saved task graph file.

For your use case, the best solution is likely to use the Python Script node to spawn the tracker in the way you want, i.e. using the subprocess module to spawn it, and let it run in the background. Then use another Python Script node to kill it at a point that makes sense in your graph.

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts