On this page |
Overview
Output files and file attributes on PDG work items are assigned tags so that PDG can identify their file types. A tag determines what application is used to open a file from the attribute pane, as well as which cache handler(s) to run when checking for an existing file on disk. You can register custom tags with the PDG pdg.TypeRegistry and use them with built-in nodes. You can also register custom cache handling functions written in Python in order to manually control PDG’s caching mechanism.
Custom tags added through the Type Registry automatically appear in the tag chooser drop-down on any nodes that have a parameter for setting the file tag.
Custom File Tags
Custom Cache Handlers
Many nodes in PDG support disk caching. If the expected output files for a work item already exist on disk, then that work item is able to cook from cache instead of re-running. You can enable caching per node, and you can configure it to either always read, always write, or read from cache only if the files exist. If you set the HOUDINI_PDG_CACHE_DEBUG environment variable before launching Houdini, PDG will print cache file debug information when a graph is cooking.
Internally, PDG verifies that cache files exist by checking if they're found on disk. This may not be suitable for all applications or all types of files. For example, this would not be suitable if your output files are stored on a cloud storage system or if you want to do extra file validation as part of the cache check.
As an alternative, you can verify the existence of cache files by registering a custom cache handler in the same way that custom file tags are registered (as described in the previous section). For example, you can create a script that defines your custom cache handlers and save it as $HOME/houdini18.5/pdg/types/custom_handlers.py
.
import os def simple_handler(local_path, raw_file, work_item): print(local_path) return pdg.cacheResult.Skip def custom_handler(local_path, raw_file, work_item): # Skip work items that don't have the right attribute if work_item['usecustomcaching'].value() == 0: return pdg.cacheResult.Skip try: if os.stat(local_path).st_size == 0: return pdg.cacheResult.Miss return pdg.cacheResult.Hit except: return pdg.cacheResult.Miss def registerTypes(type_registry): type_registry.registerCacheHandler("file/geo", custom_handler) type_registry.registerCacheHandler("file/geo/usd", simple_handler)
Each cache handler is passed three arguments: the local path to the cache file, the raw pdg.File object, and the pdg.WorkItem that owns the file. The file object contains all of the metadata associated with the file, such as its raw path and file tag.
Warning
Do not modify the work item during the cache handler hook, and do not store the work item in a global variable and then try to access it outside of the handler method. This is invalid.
Cache handlers are registered for a particular file tag. In the above example, a file tagged as file/geo/usd
would first be checked using the simple_handler
. Since that handler returns the pdg.cacheResult.Skip return code, the cache system then moves on to the next possible handler which is file/geo
. That handler verifies that the file has a non-zero size, but it only does so if the work item that owns the file has the usecustomcaching
attribute set. If both handlers return Skip
, then PDG’s built in cache checking mechanism is used instead.
As soon as a handle returns pdg.cacheResult.Hit or pdg.cacheResult.Miss, handler evaluation stops and that result is used. The most specific matching tag pattern is always evaluated first.
Note
You can register a handler for all file types by adding it with the file
tag.
Note
The cache handler will be called with the batch sub item if the file was generated as part of a batch. You can get the batch parent by looking at work_item.batchParent
.
Custom File Hash Functions
Work item output files have a 64-bit integer field that PDG uses to identify if the file is stale. For example, if you recook a File Pattern after modifying a file on disk, work items that correspond to modified files are automatically dirtied as part of the cook. By default PDG uses the file’s mod time, however you can register a custom hash function in Python to use a custom scheme.
Like the cache handlers, custom hash functions are registered based on the output file tag. For example, to use a CRC checksum for text files you can save the following code to $HOME/houdini18.5/pdg/types/custom_file_hash.py
:
import zlib def crc_handler(local_path, raw_file, work_item): try: with open(local_path, 'rb') as local_file: return zlib.crc32(local_file.read()) except: return 0 def registerTypes(type_registry): type_registry.registerHashHandler("file/txt", crc_handler)
The custom hash function is invoked with three arguments – the local path to the file, the raw pdg.File object and the pdg.WorkItem that owns the file. If the function returns a non-zero value, that value is stored as the file’s hash. If the return value is exactly zero then PDG will check other matching hash functions if any exist, or fallback to the built-in implementation that uses the file’s mod time if no other hash functions are found.
It is possible to apply the function to specific node types by filtering on the node type in the function implementation. For example, the following hash function applies to all types of file, but only for work items in a File Pattern:
import zlib def crc_handler(local_path, raw_file, work_item): if work_item and work_item.node.type.typeName != 'filepattern': return 0 try: with open(local_path, 'rb') as local_file: return zlib.crc32(local_file.read()) except: return 0 def registerTypes(type_registry): type_registry.registerHashHandler("file", crc_handler)
Warning
Do not modify the work item in the custom hash function, and do not store the work item in a global variable and then try to access it outside of the scope of the function. This is invalid.