Houdini 21.0 Executing tasks with PDG/TOPs

Scheduler Node Callbacks

Scheduler nodes execute work items

On this page

Overview
Example Implementations
Scheduler Callbacks

Overview ¶

Schedulers are one of the main types of node in a PDG graph. The purpose of a scheduler node is to execute ready work items that are submitted to the scheduler node by PDG.

In addition the scheduler must report status changes and ensure that the jobs are able to communicate back to PDG when necessary. By default the jobs will use the supporting python module pdgjson to do this communication via XMLRPC, and the scheduler is responsible for ensuring that the XMLRPC server is running. Note that this mechanism could be replaced by a custom scheduler if required.

Example Implementations ¶

All of the current PDG schedulers are implemented in Python using the interface described on this page. They can be used as reference and are found in the $HH/pdg/types/schedulers directory in a Houdini installation. A minimal example of a scheduler that creates a message queue (MQ) and uses it for job communication is also included on the PDG Examples page.

Scheduler Callbacks ¶

Each scheduler node has several callbacks that can be implemented to control how it operates. When writing a scheduler node the only callback you're required to implement is onSchedule, since that hook is responsible for actually executing the ready work items. If the work items are marked as begin in process, they will not reach the scheduler and will instead be handled by PDG’s internal scheduler. The other callbacks are optional and are only needed to further customize the behavior of the node.

Warning

The only callback that can safely write work item attributes is onSchedule. If you want to add attributes to a work item in the onTick callback, you need to use the pdg.WorkItem.lockAttributes in order to safely modify the work item.

Additionally, your scheduler node should only keep references to work items that are actively running. Once your scheduler notifies PDG that a work item has succeeded or failed, it should no longer hold a reference to that work item.

The full scheduler node API is described in pdg.Scheduler.

applicationBin(self, name, work_item) → str

This callback is used when a node is creating a command that uses an application that can be parameterized by the scheduler. For example there may be UI to control which 'python' application should be used for python-based jobs.

Custom scheduler bindings can use their own application 'names' to work with custom nodes.

At minimum 'hython' and 'python' should be supported.

onSchedule(self, work_item) → pdg.scheduleResult

This callback is evaluated when the given pdg.WorkItem is ready to be executed. The scheduler should create the necessary job spec for their farm scheduler and submit it if possible. If it doesn’t have enough resources to execute the work item, it should return Deferred or FullDeferred, which tells PDG that the scheduler can’t accommodate the work item, and it should check back later.

Otherwise it should return Succeeded to indicate that the work item has been accepted.

The other return values are used when a work item for some reason is handled immediately. This is not generally recommended because it will force work items to execute in series.

For example, Local Scheduler will return FullDeferred if it determines that all available 'slots' on the local machine are in use. On the other hand it will return Deferred if there are slots available but not enough for this particular work item. If there are enough slots, it will deduct the slots required, spawn a subprocess for the work item, and then add the work item to a private queue of running items to be tracked.

Note that the frequency that this callback is called is controlled by the pdg node parameter pdg_maxitems and pdg_tickperiod (See onTick below).

onTick(self) → pdg.tickResult

This callback is called periodically when the graph is cooking. The callback is generally used to check the state of running work items. This is also the only safe place to cancel an ongoing cook.

The period of this callback is controlled with the PDG node parameter pdg_tickperiod, and the maximum number of ready item onSchedule callbacks between ticks is controlled by the node parameter pdg_maxitems. For example by default the tick period is 0.5s and the max items per tick is 30. This means that onSchedule will be called a maximum of 60 times per second. Adjusting these values can be useful to control the load on the farm scheduler.

The callback should return SchedulerReady if the scheduler is ready to accept new work items, and should return SchedulerBusy if it’s full at the moment. In case there is a serious problem with the scheduler (for example the connection to the farm is lost), it should return SchedulerCancelCook.

onAcceptWorkItem(self, work_item) → pdg.acceptResult

By default custom schedulers will only accept out-of-process work items. In-process work items, like the ones in an Invoke TOP or a Python Script TOP, will be handled internally by PDG itself. The optional onAcceptWorkItem callback can be used to override that behavior.

The callback is called to determine if the scheduler is able to process a given work item. If it returns pdg.acceptResult.Accept then the work item will be queued with the scheduler, and passed to an onSchedule call at a later point in time. pdg.acceptResult.Reject indicates that the scheduler cannot process the specified work item, and pdg.acceptResult.Default indicates that the default behavior should be used instead.

Note

You should not try to actually cook or schedule the work item in this callback. It should only be used to determine if the work item is compatible with the custom scheduler.

onConfigureCook(self, cook_options)

This callback is called before the graph begins to cook, after the list of schedulers for the cook is chosen. It can be used to change cook options before the cook begins.

cook_options is a writeable reference to the pdg.CookOptions used by the current cook.

Note

Not all cook options can be changed by this callback. For example, changing the pdg.CookOptions.nodeNames option will have no effect because the PDG graph will have already processed it and determines the list of nodes/schedulers to cook prior to calling this function.

onSetupCook(self)

This callback is called after onStartCook, but before any work items are scheduled.

Unlike onStartCook, which blocks the UI thread, this callback is run in the background. This makes it a better choice for setup tasks that can take a while to complete, like starting and connecting to the MQ server or copying files to remotely mounted filesystems.

onStartCook(self, static, cook_set) → bool

This callback is called when a PDG cook starts, after static generation.

static is True when a static cook is being performed instead of a full cook. See onScheduleStatic for details.

cook_set is the set of PDG pdg.Node being cooked.

This can be used to initialize any resources or cache any values that apply to the overall cook. Returning False or raising an exception will abort the cook. You should tell PDG what the user’s working directory is by calling:

self.setWorkingDir(local_path, remote_path)

onStopCook(self, cancel) → bool

Called when cooking completes or is canceled. If cancel is True there will likely be jobs still running. In that case the scheduler should cancel them and block until they are actually canceled. This is also the time to tear down any resources that are set up in onStartCook. The return value is ignored.

onStart(self) → bool

Called by PDG when scheduler is first created. Can be used to acquire resources that persist between cooks. The return value is ignored.

onStop(self) → bool

Called by PDG when scheduler is cleaned up. Can be used to release resources. Note that this method may not be called in some cases when Houdini is shut down. The return value is ignored.

onCancelWorkItems(self, work_items, node)

Called when the scheduler should cancel a subset of the work items that have been scheduled during the current cook. If node is set to a value other than None, then all of the work items in the work_items list are from the same PDG node and the scheduler should cancel all tasks associated with that node. Otherwise, the scheduler should cancel the specific items listed in the work_items list.

For example, the HQueue scheduler cancels the top level node job associated with the node if one is passed in, otherwise it cancels individual work item jobs based on the contents of the work_items list.

endSharedServer(self, sharedserver_name) → bool

This method is deprecated – shared servers have been replaced with Service Blocks.

Called when a shared server should be terminated. For example a Houdini Service Block that’s configured to use shared servers will generate endserver work items which will evaluate this callback when the block has ended and the associated server should be closed. Typically the scheduler can use the shutdownServer function in the pdgjob.sharedserver module to issue the shutdown command via XMLRPC.

getStatusURI(self, work_item) → str

Called to return the status URI for the specified work item. This appears in the MMB detail window of a work item. It can be formatted to point to a local file with file:/// or a web page with 'http://'.

getLogURI(self, work_item) → str

Returns the log URI for the specified work item. This appears in the MMB detail window of a work item, and is also available with the special @pdg_log attribute. It can be formatted to point to a local file with file:/// or a web page with http://.

workItemResultServerAddr(self) → str

Returns the network endpoint for the work item result server, in the format <HOST>:<PORT>, this is equivalent to the __PDG_RESULT_SERVER__ command token, and the job environment variable $PDG_RESULT_SERVER. This will typically be an XMLRPC API server.

onScheduleStatic(self, dependency_map, dependent_map, ready_items) → None

Called to do a static cook of the graph, which is a cook mode of StaticDepsFull or StaticDepsNode. Typically this function will build a complete job spec and submit this to the farm scheduler. How this is done depends on your farm scheduler API. For example the dependencies between work items may have to be translated into parent/child relationships in the job spec so that the work is executed in the correct order.

Note

This functionality is only needed if complete static cooks are required. See /tops/custom_scheduler.html#staticcook.

dependency_map is a map of pdg.WorkItem to a set of it’s dependency work items.

dependent_map is a map of pdg.WorkItem to a set of it’s dependent work items.

ready_items is a list of pdg.WorkItem that are ready to be executed.

Note that this information can be obtained via pdg.Graph.dependencyGraph

import pdg
n = hou.node("/obj/topnet1/out")
# Call cookWorkItems to ensure PDG context is created
n.cookWorkItems(tops_only=True)
# Perform generation phase of PDG cook
n.getPDGGraphContext().cook(True, pdg.cookType.StaticDepsFull)
# Retrieve the generated task graph work items and topology 
(dependencies, dependents, ready) = n.getPDGGraphContext().graph.dependencyGraph(True)

Note

This mode of cooking is not exposed in the TOP UI, and is not supported by the stock schedulers. Although Local Scheduler has a basic implementation for demonstration purposes. To trigger this mode of cooking you can call pdg.GraphContext.cook with mode of StaticDepsFull or StaticDepsNode).

The implementation should save the required data and return immediately from this function. Then it should asynchronously manage the execution of the graph and report back all state changes via the scheduler node functions onWorkItemSucceeded, onWorkItemFailed or onWorkItemCanceled. In addition, it should ensure that all attribute changes and added files during the job are reported back to PDG, for example by calling onWorkItemAddOutput.

Once all work items have been reported back to PDG as finished the static cook will end.

Scheduler Node Callbacks

Overview ¶

Example Implementations ¶

Scheduler Callbacks ¶

Executing tasks with PDG/TOPs

Basics ¶

Beginner Tutorials ¶

Next steps ¶

Reference ¶