How to get all dependencies recursively for work items / find the top most work items contributing to a workitem downstream.

   3342   5   2
User Avatar
Member
151 posts
Joined: 2月 2009
Offline
I'm wondering if there is any pdg python function to retrieve all dependencies upstream that contributed to a work item.

workitem.dependencies only resolves 1 dependency back, but my goal would be to find all data that contributed to the current work item, and because you can have a work item dependent on multiple work items the recursive function to do this im working on is getting a bit verbose. Currently I'm working on this for archive / disk cleanup / remote syncronisation.

It would also be helpful to find all terminators upstream that contributed to the current work item (top most workitem / top node) because this is likely to be the node that could be automatically dirtied when incrementing a version on a next iteration.
https://openfirehawk.com/ [openfirehawk.com]
Support Open Firehawk - An open source cloud rendering project for Houdini on Patreon.
This project's goal is to provide an open source framework for cloud computing for heavy FX based workflows and allows end users to pay the lowest possible price for cloud resources.
User Avatar
Member
151 posts
Joined: 2月 2009
Offline
If there isn't anything provided, this seems to work just in an initial sandbox test. but I get the feeling this might not be necesary….

It recursively builds a list of all workitems for a given node and all upstream dependencies. it then checks all workitems for ones that have no dependencies (terminators). I have no idea how it would play with dynamic work items though.

    import numpy as np

    pdg_node = node.getPDGNode()
    terminators = []
    
    added_workitems = []
    added_dependencies = []
    

    
    def append_workitems(node):
        global added_workitems
        print "node.workItems", node.workItems
        print "added_workitems", added_workitems
        if len(node.workItems) > 0:
            print "loop over workitems"
            for workitem in node.workItems:
                print "workitem", workitem
                if workitem not in added_workitems:
                    added_workitems += [workitem]

    def append_dependencies(workitem):
        global added_workitems
        global added_dependencies 
        added_dependencies += [workitem]
        if len(workitem.dependencies) > 0:
            for dependency in workitem.dependencies:
                if dependency not in added_workitems:
                    added_workitems += [dependency]
    
    
    print "first append", pdg_node
    append_workitems(pdg_node)
    
    print "add dependencies for first workitems"
    for workitem in added_workitems:
        append_dependencies(workitem)
    
    test_list = np.setdiff1d(added_workitems,added_dependencies)
    
    while len(test_list) > 0:
        for workitem in test_list:
            append_dependencies(workitem)
        test_list = np.setdiff1d(added_workitems,added_dependencies)
    
    print "workitems", added_workitems
    
    for workitem in added_workitems:
        print "len(workitem.dependencies)", len(workitem.dependencies)
        if len(workitem.dependencies) == 0:
            if workitem.node not in terminators:
                terminators += [workitem.node]
    
    print "terminators", terminators
    for terminator in terminators:
        print "terminator", terminator.name
Edited by Andrew Graham - 2019年7月7日 09:34:33
https://openfirehawk.com/ [openfirehawk.com]
Support Open Firehawk - An open source cloud rendering project for Houdini on Patreon.
This project's goal is to provide an open source framework for cloud computing for heavy FX based workflows and allows end users to pay the lowest possible price for cloud resources.
User Avatar
Member
159 posts
Joined: 2月 2018
Offline
Yeah, I want to know that, too. Currently, the cache mode doesn't work as I thought, the “Automatic” mode is just a file existing checking instead of dependencies checking.
If the expected result file exists on disk, the work item is marked as cooked without being scheduled. If the file does not exist, the item is scheduled as normal.
So when I do some changes on the upper node, and to recook downside nodes, I would have to set all the downside nodes' cache mode to “Write”, or delete all the caches that affected by the upper node. and then reset all the nodes to “Automatic” mode for convenient. What I want is a more procedural way to auto dirty the workitems' cache by dependencies which is what “Automatic” cache mode should really be.
Edited by EricSheng - 2019年7月7日 12:43:13
User Avatar
Member
151 posts
Joined: 2月 2009
Offline
Yes I had a bug report about this in part but nothing has changed yet.

However the work around for what you are looking for is to right click and delete work items on the top most fetch node only. upon next cook, all work items downstream should be replaced.

theres three or fours methods I see a user may want to perform that I've encountered and the ui doesn't resolve them all or they are not named well.

1. we may want to delete the workitems, and cook the node, that is achieveable by selecting to delete work items and cooking the node, so thats fine.

2. we may want to mark the items as dirty, (not necesarily deleting them off disk, but marking them for replacement upon next cook), and then cooking the node next should replace those items as they are produced. this behaviour is not achievable with the fetch top node for example. the items must be deleted before they will be cooked and replaced. “dirty” currently will not replace the items if they exist on disk.

3. we may want to perform a “refresh and cook” for workitems only if they are marked dirty or non existent. we dont' necesarily want to mark anything as dirty here. we just want to do a reality check and replace if needed. currently we approximate this behaviour by marking a node dirty, and cooking it, but this is misnamed, since the goal is realy only to perform a refresh if items need replacing, and not to force replacement.

4. in some cases we may want to have the dirty state of the current work item only be affected by downstream items dirty state.
temporary ssd caching is a great example, or file trasnfer via cloud.
eg:
A: write cache out to SSD
B: copy cache to NAS storage (different path)
C: optionally delete workitems in A.

now we wouldn't want A to be dirty now since it would incorrectly execute each time we reopenned the hip file. A's expected output path should be the ssd location, but once B completes, then it A should only be marked dirty / able to execute if B workitems are dirty / don't exist. this is kind of simple, it would be better to track history with an md5 in some cases.

This also applies to cloud sync and prerender scripts.

I'm sure there are other cases where the way that dirty state is propogated needs to be customised. in the example above, just because data is written out to a temp locaiton on disk, its non existance at that location doesn't mean the work item needs to cook.

to go further, we might push data to cloud storage and not have anything on disk to read to query in future.
A: write data to local disk, keep an md5 as evidence.
B: transfer data to cloud. keep an md5 as evidence upon success.
C: delete original data from A.

We may only want to regenerate A and B in this output example if the MD5's don't match/are missing for A and B, or if A is deliberately marked as dirty (don't just recook each time you open houdini, which is what would happen now with expected output paths)
https://openfirehawk.com/ [openfirehawk.com]
Support Open Firehawk - An open source cloud rendering project for Houdini on Patreon.
This project's goal is to provide an open source framework for cloud computing for heavy FX based workflows and allows end users to pay the lowest possible price for cloud resources.
User Avatar
Member
151 posts
Joined: 2月 2009
Offline
I did find an exception to the above method in finding all upstream work items that didn't work with what I listed above. if you plug render output nodes into a wait for all, then into an ffmpeg node, the ffmpeg node wont resolve the upstream work items beyond the wait for all. This is in h17.5.326

I think thats because a partition node is not technically a workitem, but it is a dependency, so that messes with the logic.
Edited by Andrew Graham - 2019年8月2日 07:29:52
https://openfirehawk.com/ [openfirehawk.com]
Support Open Firehawk - An open source cloud rendering project for Houdini on Patreon.
This project's goal is to provide an open source framework for cloud computing for heavy FX based workflows and allows end users to pay the lowest possible price for cloud resources.
User Avatar
Member
151 posts
Joined: 2月 2009
Offline
Now I'm currently using this for archival purposes and disk cleanup which is another bonus of tops being output path aware!

This approach below seems to work better. Traverse up all inputs, then evaluate all work items once you have all nodes in the tree.

    def get_upstream_workitems(self):
        # this will generate the selected workitems
        self.pdg_node = self.node.getPDGNode()
        self.node.executeGraph(False, False, False, True)
        
        added_workitems = []

        added_nodes = []
        added_node_dependencies = []

        def append_node_dependencies(node):
            added_node_dependencies.append(node)
            if len(node.inputs) > 0:
                for input in node.inputs:
                    input_connections = input.connections
                    print "input_connections", input_connections
                    if len(input_connections) > 0:
                        for connection in input_connections:
                            dependency = connection.node
                            if dependency not in added_nodes:
                                added_nodes.append(dependency)

        
        added_nodes.append(self.pdg_node)
        for node in added_nodes:
            append_node_dependencies(node)
        diff_list = np.setdiff1d(added_nodes, added_node_dependencies)
        
        while len(diff_list) > 0:
            for node in diff_list:
                append_node_dependencies(node)
            diff_list = np.setdiff1d(
                added_nodes, added_node_dependencies)

        print "added_nodes", added_nodes

        for node in added_nodes:
            for workitem in node.workItems:
                added_workitems.append(workitem)

        return added_workitems
Edited by Andrew Graham - 2019年8月2日 08:21:55
https://openfirehawk.com/ [openfirehawk.com]
Support Open Firehawk - An open source cloud rendering project for Houdini on Patreon.
This project's goal is to provide an open source framework for cloud computing for heavy FX based workflows and allows end users to pay the lowest possible price for cloud resources.
  • Quick Links