Houdini 18.5 Executing tasks with PDG/TOPs

File paths

Best practices for input/output file paths in TOP networks.

On this page

Overview

TOPs is designed to work with compute farms that may have a variety different filesystems. For example, a TOPs user could be on a Windows machine, but also be using a Linux-based farm. The problem is how to map file paths from one filesystem to another. TOPs provides the PDG Path Map to address this.

Most TOP nodes that do work let you specify input and/or output file paths. In TOPs, each scheduler node can specify a working directory. This is because different render farm software may use different shared network filesystems. In the case you are using a farm scheduler, you should make sure that all files you output are reachable by the farm machines relative to this PDG_DIR directory.

How to

  1. Set the base working directory on the scheduler node. This directory is available to jobs as the PDG_DIR environment variable.

    • Use separate working directories for each HIP file. This is to avoid having two HIP files writing to the same PDG_DIR. Many of the default generated filenames used in parameter defaults are only unique within the HIP file.

    • For render farm schedulers, make sure that the directory is inside the network filesystem (like NFS mount or SMB share) and is shared with the render farm client machines.

  2. When you use PDG_DIR or PDG_TEMP in parameter filenames, use the form __PDG_DIR__ instead of ${PDG_DIR}. If you use ${PDG_DIR}, Houdini will try and fail to expand the variable itself before the dependency graph gets it. Houdini will ignore __PDG_DIR__ syntax, but the PDG scheduler knows to expand that token to the absolute path on the executing machine.

  3. Put intermediate files under __PDG_TEMP__ and final output files under __PDG_DIR__.

    • Categorize output files using subdirectories. For example, __PDG_TEMP__/geo for intermediate geometry files and __PDG_DIR__/geo for final geometry output.

Environment Variables

PDG_DIR

The TOP network’s working directory, as specified on the Scheduler node. In TOP parameters, this is an alias for __PDG_DIR__. This is set in the job environment to the local path to the cook working directory.

PDG_TEMP

A shared temporary file directory inside the working directory for the current session. The default is $PDG_DIR/pdgtemp/houdini_process_id. This is set in the job environment.

PDG_SCRIPTDIR

A shared script directory inside the temp directory. Script files are copied into this directory if they are listed as file dependencies. The default is $PDG_TEMP/scripts. This is set in the job environment.

Alternatively, you can put custom scripts in known locations in the shared network filesystem and execute them using that path.

PDG_ITEM_NAME

The name of the work item being executed, which also corresponds to the name of the serialized work item file in the data directory. This is set in the job environment.

PDG_HYTHON

If set in Houdini, schedulers use this path instead of $HFS/bin/hython. Note that if this is a different version of Houdini being run by Local Scheduler, at least $HFS will need to be cleared in the work item environment to avoid library conflicts.

PDG_RESULT_SERVER

The hostname and port of the server that jobs send their status and results to. This is set in the job environment.

PDG_PATHMAP

The path map in JSON form (if it exists).

PDG_PATHMAP_ZONE

The custom path map zone for the job which is used instead of the automatically determined zone.

Executing tasks with PDG/TOPs

Basics

Beginner Tutorials

Next steps

  • Running external programs

    How to wrap external functionality in a TOP node.

  • File tags

    Work items track the "results" created by their work. Each result is tagged with a type.

  • PDG Path Map

    The PDG Path Map manages the mapping of paths between file systems.

  • Feedback loops

    You can use for-each blocks to process looping, sequential chains of operations on work items.

  • Command servers

    Command blocks let you start up remote processes (such as Houdini or Maya instances), send the server commands, and shut down the server.

  • PDG Service Manager

    The PDG Service Manager manages pools of persistent Houdini sessions that can be used to reduce work item cooking time

  • Integrating PDG with render farm schedulers

    How to use different schedulers to schedule and execute work.

  • Visualizing work item performance

    How to visualize the relative cook times (or file output sizes) of work items in the network.

  • Event handling

    You can register a Python function to handle events from a PDG node or graph

  • Tips and tricks

    Useful general information and best practices for working with TOPs.

  • Troubleshooting PDG scheduler issues on the farm

    Useful information to help you troubleshoot scheduling PDG work items on the farm.

  • PilotPDG

    Standalone application or limited license for working with PDG-specific workflows.

Reference

  • All TOPs nodes

    TOP nodes define a workflow where data is fed into the network, turned into "work items" and manipulated by different nodes. Many nodes represent external processes that can be run on the local machine or a server farm.

  • Processor Node Callbacks

    Processor nodes generate work items that can be executed by a scheduler

  • Partitioner Node Callbacks

    Partitioner nodes group multiple upstream work items into single partitions.

  • Scheduler Node Callbacks

    Scheduler nodes execute work items

  • Custom File Tags and Handlers

    PDG uses file tags to determine the type of an output file.

  • Python API

    The classes and functions in the Python pdg package for working with dependency graphs.

  • Job API

    Python API used by job scripts.