Caching Improvements in Today's Daily Build (18.0.436)

   2524   6   3
User Avatar
Staff
585 posts
Joined: May 2014
Offline
Changes to the behavior of the Automatic caching mode:

The following changes are available in the current daily 18.0 build, temporarily guarded with the HOUDINI_PDG_EXPERIMENTAL environment variable.

In short, the Automatic caching mode now takes into account the cache state of upstream work items that also are capable of cooking from cache. As an example, say you have a ROP Geometry node followed by a ROP Mantra, where both nodes are set to Automatic. If you manually delete the ROP Geometry output files on disk and recook the graph:

* Before the new changes, the work items in the Mantra node would still instantly cook from cache because it would find the expected image files on disk.
* Starting with the daily build, the work items in the Mantra node will also recook instead of using their cache files, because an upstream dependency produced a new version of an input file.

The way this change is actually implemented is that each PDG node will now explicitly track a mapping of output file paths to cache version id, assuming that the node supports the caching feature to begin with. Since the information is tracked at the node level, it persists even when the work items are deleted and recreated. Currently this information is stored in an internal structure that PDG maintains, but we'll be looking at extending it so it can be queried and possibly updated directly using the Python API. It is currently possible through Python to flag a work item's file cache id as invalid, using the pdg.WorkItem.invalidateCache() method. This will force that work item and any downstream dependents, to not cook from cache files even if the caching mode is set to Automatic and the files exist.

The Read Files cache mode will still always read from disk files, regardless of whether the work item's cache id is stale. On the flip side, the Write Files cache mode will always force the work item and downstream items to not use disk files. Effectively this is the same as calling invalidate cache immediately before that item cooks.



Support for custom cache handling:

The following changes are available in the daily build directly. They're NOT behind the environment variable mentioned earlier.

It is now possible to register custom cache handlers that define logic for checking if a given file path is found on disk. These are keyed on the PDG output file tag. So, for example, you can register a custom handler for the “file/geo” tag, or a custom handler for the more generic “file” tag.

Like the other aspects of the tagging system, the selection of custom cache handler is hierarchical. A result with the tag “file/geo/collision” will first be handed to the custom handler for the “file/geo/collision” tag. If no handler is found, or the handler indicates that it wants to skip that file, it will then continue on to the “file/geo” handler, and then finally the “file” handler.

Each cache handler is passed the local file path to the file to check, a pdg.File object which contains the raw, unlocalized path, the file's tag and other metadata, and the work item that owns the file. You can look up attributes or access the scheduler from the work item, but the handler isn't permitted to modify the work item in the cache handler.

For example, in the code below there are two custom cache handlers. The first one is useless and does nothing but print the path and return the Skip result, which means it doesn't want to handle the file. The second one verifies that the file exists and is non-zero sized:

import os
import pdg  

def empty_handler(local_path, raw_file, work_item):
    print(local_path)
    return pdg.cacheResult.Skip

def custom_handler(local_path, raw_file, work_item):
    try:
        if os.stat(local_path).st_size == 0:
            return pdg.cacheResult.Miss
        return pdg.cacheResult.Hit
    except:
        return pdg.cacheResult.Miss

def registerTypes(type_registry):
    type_registry.registerCacheHandler("file/geo", custom_handler)
    type_registry.registerCacheHandler("file/geo/collision", empty_handler)

A work item output file with the tag “file/geo/collision” would first check against the empty_handler, and then fallback to the custom_handler. Other geometry files would immediately resolve against the custom_handler since it would be the best match. If no handlers are installed or can be matched to a given file, then PDG's regular caching check that just checks for file existence is used instead.

Handlers can be registered in the same way as custom node or scheduler definitions. You can put them in a Python script in pdg/types on the Houdini search path. For example, ~/houdini18.0/pdg/types/custom_handlers.py on Linux. The scripts in that directory will be loaded automatically and have their registerTypes function called by PDG. There are some additional details on a new doc page about registered custom file tags and cache handlers: https://www.sidefx.com/docs/houdini/tops/custom_tags.html [www.sidefx.com]
User Avatar
Member
1694 posts
Joined: March 2020
Offline
That's great, thanks!
Imre Tuske
FX Supervisor | Senior FXTD @ Weta FX

qLib -- Houdini asset library
http://qlab.github.io/qLib/ [qlab.github.io]
https://www.facebook.com/qLibHoudini [www.facebook.com]
User Avatar
Member
209 posts
Joined: Nov. 2010
Offline
Is it possible to register CacheHandler for all tags?
I just need to execute some checks for any result tag.

Looks like empty string doesn't work
User Avatar
Member
603 posts
Joined: Sept. 2016
Offline
No, but that's a bug. It will be fixed soon so that empty string is the fallback handler. Currently you have to supply a non-empty string, so you would need to register a handler for every possible tag prefix of at least one character.
Edited by chrisgreb - June 9, 2020 12:29:53
User Avatar
Member
209 posts
Joined: Nov. 2010
Offline
Let me know, please which build to try.

Also maybe a wildcard (“*”) could be as a solution
User Avatar
Member
603 posts
Joined: Sept. 2016
Offline
18.0.493 Should have the fix. (Empty string matches any tag)
User Avatar
Member
209 posts
Joined: Nov. 2010
Offline
Can confirm - it works as expected.
Thanks for fast improvement!
  • Quick Links