|On this page|
Schedulers are one of the main types of node in a PDG graph. The purpose of a scheduler node is to execute ready work items that are submitted to the scheduler node by PDG.
In addition the scheduler must report status changes and ensure that the jobs are able to communicate back to PDG when necessary. By default the jobs will use the supporting python module
pdgjson to do this communication via XMLRPC, and the scheduler is responsible for ensuring that the XMLRPC server is running. Note that this mechanism could be replaced by a custom scheduler if required.
Each scheduler node has several callbacks that can be implemented to control how it operates. When writing a scheduler node the only callback you're required to implement is
onSchedule, since that hook is responsible for actually executing the ready work items. If the work items are marked as begin in process, they will not reach the scheduler and will instead be handled by PDG’s internal scheduler. The other callbacks are optional and are only needed to further customize the behavior of the node.
The full scheduler node API is described in pdg.Scheduler.
onSchedule(self, work_item) → pdg.scheduleResult
This callback is evaluated when the given pdg.WorkItem is ready to be executed. The scheduler should create the necessary job spec for their farm scheduler and submit it if possible. If it doesn’t have enough resources to execute the work item, it should return Deferred or FullDeferred, which tells PDG that the scheduler can’t accomodate the work item, and it should check back later.
Otherwise it should return Succeeded to indicate that the work item has been accepted.
The other return values are used when a work item for some reason is handled immediately. This is not generally recommended because it will force work items to execute in series.
For example, Local Scheduler will return FullDeferred if it determines that all available 'slots' on the local machine are in use. On the other hand it will return Deferred if there are slots available but not enough for this particular work item. If there are enough slots, it will deduct the slots required, spawn a subprocess for the work item, and then add the work item to a private queue of running items to be tracked.
Note that the frequency that this callback is called is controlled by the pdg node parameter
onTick(self) → pdg.tickResult
This callback is called periodically when the graph is cooking. The callback is generally used to check the state of running work items. This is also the only safe place to cancel an ongoing cook.
The period of this callback is controlled with the PDG node parameter
pdg_tickperiod, and the maximum number of ready item
onSchedule callbacks between ticks is controlled by the node parameter
pdg_maxitems. For example by default the tick period is 0.5s and the max items per tick is 30. This means that
onSchedule will be called a maximum of 60 times per second. Adjusting these values can be useful to control the load on the farm scheduler.
The callback should return SchedulerReady if the scheduler is ready to accept new work items, and should return SchedulerBusy if it’s full at the moment. In case there is a serious problem with the scheduler (for example the connection to the farm is lost), it should return SchedulerCancelCook.
onStartCook(self, static, cook_set) →
This callback is called when a PDG cook starts, after static generation.
True when a static cook is being performed instead of a full cook. See
onScheduleStatic for details.
cook_set is the
set of PDG pdg.Node being cooked.
This can be used to initialize any resources or cache any values that apply to the overall cook. Returning
False or raising an exception will abort the cook.
You should tell PDG what the user’s working directory is by calling:
onStopCook(self, cancel) →
Called when cooking completes or is canceled. If
True there will likely be jobs still running. In that case the scheduler should cancel them and block until they are actually canceled. This is also the time to tear down any resources that are set up in
Called by PDG when scheduler is first created. Can be used to acquire resources that persist between cooks.
Called by PDG when scheduler is cleaned up. Can be used to release resources. Note that this method may not be called in some cases when Houdini is shut down.
endSharedServer(self, sharedserver_name) →
Called when a shared server should be terminated. For example the
Houdini Command Chain will generate
endserver work items which will evaluate this callback when the command chain has ended and the associated Houdini server should be closed. Typically the scheduler can use the
shutdownServer function in the
pdgjob.sharedserver module to issue the shutdown command via XMLRPC. See command servers for additional details on the use of command chains.
getStatusURI(self, work_item) →
Called to return the status URI for the specified work item. This appears in the MMB detail window of a work item. It can be formatted to point to a local file with
file:/// or a web page with 'http://'.
getLogURI(self, work_item) →
Returns the log URI for the specified work item. This appears in the MMB detail window of a work item, and is also available with the special
@pdg_log attribute. It can be formatted to point to a local file with
file:/// or a web page with 'http://'.
Returns the network endpoint for the work item result server, in the format <HOST>:<PORT>, this is equivalent to the
__PDG_RESULT_SERVER__ command token, and the job environment variable $PDG_RESULT_SERVER. This will typically be an XMLRPC API server.
onScheduleStatic(self, dependency_map, dependent_map, ready_items) →
Called to do a static cook of the graph, which is a cook mode of StaticDepsFull or StaticDepsNode. Typically this function will build a complete job spec and submit this to the farm scheduler. How this is done depends on your farm scheduler API. For example the dependencies between work items may have to be translated into parent/child relationships in the job spec so that the work is executed in the correct order.
This functionality is only needed if complete static cooks are required. In order to show status changes in the TOP graph, the implementation will have to provide a callback server so that jobs can report results and status changes. As well it will have to ensure that all work items are serialized such that their JSON representation is available to the job scripts when executed. In addition, not all TOP nodes support this mode of cooking by default, and may require some customization to work with your farm scheduler. For example ROP Fetch and other ROP-based nodes will poll the callback server if ROP Fetch
cookwhen is not set to
All Frames are Ready when batched.
dependency_map is a map of pdg.WorkItem to a
set of it’s dependency work items.
dependent_map is a map of pdg.WorkItem to a
set of it’s dependent work items.
ready_items is a list of pdg.WorkItem that are ready to be executed.
Note that this information can be obtained via pdg.Graph.dependencyGraph
import pdg n = hou.node("/obj/topnet1/out") # Call executeGraph to ensure PDG context is created n.executeGraph(True, True, False, True) # Perform generation phase of PDG cook n.getPDGGraphContext().cook(True, pdg.cookType.StaticDepsFull) # Retrieve the generated task graph work items and topology (dependencies, dependents, ready) = n.getPDGGraphContext().graph.dependencyGraph(True)
The implementation should save the required data and return immediately from this function. Then it should asynchronously manage the execution of the graph and report back all state changes via the scheduler node functions onWorkItemSucceeded, onWorkItemFailed or onWorkItemCanceled. In addition, it should ensure that all data changes done by jobs are reported back to PDG, for example by calling onWorkItemFileResult.
Once all work items have been reported back to PDG as finished the static cook will end.