Houdini 18.0 Executing Tasks

Troubleshooting PDG scheduler issues on the farm

Useful information to help you troubleshoot scheduling PDG work items on the farm.

On this page

This page contains general troubleshooting recommendations for all TOP schedulers. For more in-depth information about schedulers, please see the TOP scheduler documentation.

General debugging tips

Logs

  • Check the farm logs or job logs for warnings or errors.

  • Attach any warning or error messages you find to your bug reports.

Deadline scheduler

  • In the Deadline Scheduler node, turn on the Deadline ▸ Verbose Logging parameter to enable log output from the scheduler. The log output may contain useful warning or error messages. Please note that we will be adding this parameter to the other TOP schedulers soon.

  • Add the log output to your support tickets or your SideFX forum posts to help SideFx track down the problem.

Farm machines

  • Make sure that all your farm machines and your submitting machine have access to your network file system.

  • Ideally, you should have at least two farm machines or nodes available for cooking PDG work items. A single farm machine may not be able to run both the work item tasks and the MQ job, especially for the TOP Deadline Scheduler.

Paths

  • Do not use single backslashes (\) in paths as these are treated as escape sequences evaluated by Houdini. Instead, please use double backslashes (\\) to accomodate Houdini’s evaluation, or simply use forward slashes (/).

  • Spaces in paths are not supported by Houdini. Instead, surround your paths with quotation marks (") or use backslashes (\) to escape the space characters.

Work items fail to report results due to connection refused or time out

PDG work items executing on farm machines have to report their results back to the Houdini process that initiated their cook. This Houdini process is typically run on a user’s workstation, also known as the submitting machine, which is not a farm machine and in some cases may even have a different network environment than the farm machines.

The results are reported back via a network socket-based Remote Procedure Call (RPC). To receive these results, a server is automatically started on the submitting machine to listen for these RPCs and to respond back if needed.

That is why the executing work items need to know the IP address (or host name) and port number of the submitting machine, and there needs to be a resolvable network route from each farm machine to the submitting machine.

Firewalls

Problems

  • Firewalls and host name resolution can cause issues with the PDG work item reporting mechanism.

  • Firewalls can get in the way of RPCs if they are enabled on any of your farm machines, your submitting machines, or between networks.

    To work around this, PDG utilizes the Message Queue (MQ) server. The MQ server can run as a task or job on your farm machines behind your firewalls. It can also use a limited number of ports (at least 2) if they are allowed through your firewalls to the submitting machine.

Solutions

  • Contact your IT Administrator to allow a few ports through your firewalls.

  • Specify these ports in the Task Callback Port and Relay Port parameter fields on your TOP scheduler nodes.

    For more information on these nodes, see TOP nodes.

DNS

Problem

Domain Name Resolution (DNS) can cause issues when reporting results via RPCs. Currently, the reporting mechanism uses hostname by default, which needs to be resolved to an actual IP address via a hosts file or DNS.

Solutions

  • For the hosts file, you can edit:

    Windows

    C:\Windows\System32\Drivers\etc\hosts

    Linux

    /etc/hosts

    Mac

    /etc/hosts

  • If neither are available (for example, like with an AWS farm without DNS), the RPC mechanism can attempt to resolve the IP address of the MQ server.

    • You can enable this by specifying the PDGMQ_USE_IP=1 environment value in the work item job process or the .hip file.

    • For the Deadline Scheduler node, you can enable this by turning its Deadline ▸ Use IP Address for PDGMQ parameter.

MQ

  • For Submit Graph as Job cooks, the MQ server runs locally on the submitting job on the farm. As such, this should allow it to avoid any networking issues.

  • Running MQ as its own job or task takes up a farm machine for some scheduler set-ups. In addition, each scheduler node might run its own MQ server.

    Tip

    We are working on a new MQ server that will allow you to share MQ servers across multiple scheduler nodes, run your MQ servers as services, and manage them all manually. This new MQ will also use IP addresses, allowing you to avoid the need for DNS.

Work items fail due to required files not found

PDG on farms requires a network file system that is accessible by all machines involved in the process; this includes the submitting machine as well as all of the farm machines. All the files required by this process are copied to the PDG working directory specified by the scheduler located on the network file system. For more information, please see paths.

Problems

Issues that can interfere with this process are:

  • Different file paths for submitting machine vs. farm machines.

  • Non-homogeneous farm machine set-ups (for example, when you have Windows, macOS, and Linux machines in the same farm).

Solution

Each of the TOP scheduler nodes provides parameters to specify the remote file paths separately from the local file paths for the working directory.

  • Specify the local file path for the submitting machine.

  • Specify the remote file path that the farm machines can resolve.

HQueue Scheduler

  • Turn on the Override Local Shared Root parameter on your TOP scheduler node and then specify the appropriate Local Shared Root Paths.

Deadline Scheduler

  • For the local file path, use the Working Directory ▸ Local Shared Path parameter field on your TOP scheduler node.

  • For the remote file path, use the Working Directory ▸ Remote Shared Path parameter field on your TOP scheduler node.

Tractor Scheduler

  • Use the Shared File Root Path parameter fields on your TOP scheduler node.

Python not found

Problem

PDG requires Python for executing work on the farm. As such, the TOP schedulers assume that the Python executable is accessible via the system path.

Solution

  1. Install Python.

    If Houdini is installed on your farm machines, you can use the Python that ships with Houdini.

    It is located in:

    Windows

    $HFS/python27/python.exe

    Linux

    $HFS/python/bin/python

    Mac

    $HFS/Frameworks/Python.framework/Versions/Current/bin/python

  2. Do one of the following:

    • For a global solution, add the path to the Python executable to the system path environment.

    • For a solution specific to a single TOP scheduler and all its work items, specify the path to the Python executable in:

      • HQueue Scheduler

        The Executable Paths ▸ Python Executable parameter field on your TOP scheduler node.

      • Deadline Scheduler

        The Paths ▸ Python parameter field on your TOP scheduler node.

      • Tractor Scheduler

        The Scheduler ▸ Python Executable parameter field on your TOP scheduler node.

Executing Tasks

Basics

Beginner Tutorials

Next steps

Reference

  • All TOPs nodes

    TOP nodes define a workflow where data is fed into the network, turned into "work items" and manipulated by different nodes. Many nodes represent external processes that can be run on the local machine or a server farm.

  • Processor Node Callbacks

    Processor nodes generate work items that can be executed by a scheduler

  • Partitioner Node Callbacks

    Partitioner nodes group multiple upstream work items into single partitions.

  • Scheduler Node Callbacks

    Scheduler nodes execute work items

  • Custom File Tags and Cache Handlers

    PDG uses file tags to determine the type of an output file.

  • Python API

    The classes and functions in the Python pdg package for working with dependency graphs.