Abort workitem cooking and pass to next one?

Forums PDG/TOPs Abort workitem cooking and pass to next one?

1610 3 2


Andr: Member; 900 posts; Joined: Feb. 2016; Offline

June 17, 2019 4:55 p.m.

Is there any way to abort a workitem if the cooking time is greater than some threshold value? And would it be possible to flag the failed workitem in some way?
This would be extremely useful to debug and stress test your digital assets, and find those paramater values that escalate the cooking time.
Sometime some specif combination of parameter values simply crash houdini.
What would happen if that occurred on a workitem cooking?
The whole PDG cooking would stop and crash, or the topnet would be able to start to cook the next queued workitem?


chrisgreb: Member; 603 posts; Joined: Sept. 2016; Offline

June 18, 2019 9:10 a.m.

Andr
Is there any way to abort a workitem if the cooking time is greater than some threshold value? And would it be possible to flag the failed workitem in some way?
This would be extremely useful to debug and stress test your digital assets, and find those paramater values that escalate the cooking time.

No, there's no way to specify that in TOPs. However most farm schedulers have a job specification option for ‘max runtime’ (and sometimes ‘min runtime’). We could expose that for HQueue and Tractor for example. For the local scheduler this would be an RFE that would require a bit more work.

Andr
Sometime some specif combination of parameter values simply crash houdini.
What would happen if that occurred on a workitem cooking?
The whole PDG cooking would stop and crash, or the topnet would be able to start to cook the next queued workitem?

If the workitem task crashes that's a perfectly ‘normal’ way for a workitem to fail. It will be marked as failed and so all downstream dependencies will also fail or not be generated. The topnet will still keep cooking until there's no more other work to be done.


Andr: Member; 900 posts; Joined: Feb. 2016; Offline

June 18, 2019 9:58 a.m.

chrisgreb
No, there's no way to specify that in TOPs. However most farm schedulers have a job specification option for ‘max runtime’ (and sometimes ‘min runtime’). We could expose that for HQueue and Tractor for example. For the local scheduler this would be an RFE that would require a bit more work.

This feature would make a lot of sense especially for the local scheduler, which it is used by people with low resources (= low amount of cores at disposal). A single core stuck forever in a workitem cooking, would have much more negative impact on the final result in single computer pdg session than in a farm.
I hope you will consider to implement it and I kind of see in the nature of PDG to have the user in control of the performance flow.

If the workitem task crashes that's a perfectly ‘normal’ way for a workitem to fail. It will be marked as failed and so all downstream dependencies will also fail or not be generated. The topnet will still keep cooking until there's no more other work to be done.

That's a good info and made me think about a very dirty workaround for point 1.
What if we make the workitem actually fail after some time has passed?
We could put a python node upstream with a timer that after 30“ would check a boolean attribute value in the final downstream node.
The bool is set by default to 0, and the last node would switch it to 1.
If the python timer still see it at 0 after 30”, it would crash Houdini.
I just checked that it's actually possible to delay the reading of an attribute on a node downstream.
Now the question, what's the simplest and safest way to crash Houdini with python?

Edited by Andr - June 18, 2019 10:00:01


chrisgreb: Member; 603 posts; Joined: Sept. 2016; Offline

June 20, 2019 4:22 p.m.

Maybe not the most interesting way to crash - but you can always call

os._exit(1)

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts