Implementing own "does file exist?" logic for ROP Geometry

   2258   2   2
User Avatar
Member
1694 posts
Joined: March 2020
Offline
Hi,


I'm looking for a way where I could somehow implement my own logic for “is the file there already?” for ROP Geometry TOPs. I am PDG noob, and the only mechanism I'm aware of atm is the expected-result attribute, which points to a file (and if the file exists it'll skip the writing.)

Right now we have a preallocation scheme that “allocates” file sequences across disks (creates 0-byte placeholders on “raw” disks, and soft links to them in the actual FS folder where you'd write your sequence to). So the problem in this case is the files are there, but 0 bytes long. I'd like to make PDG consider these files as if they're “not there” so it'd write the geometry. (Unfortunately I can't make it work by something like removing the files before writing – this link-to-empty-file thing has to exist in order so the file would get written to the right disk).

An own logic would also be great to have to do validity checking, e.g. if the files are incomplete (or just too small) they could be considered to be (over-)written.

I was considering workarounds such as using an FS plugin, but 1) I saw it previously mentioned on the forum that it might not be supported fully throughtout PDG 2) I feel it'd be a bit of an overkill to write an FS plugin just for this.

Any ideas or not-too-painful workarounds would be appreciated!

Cheers
Imre
Edited by riviera - April 12, 2020 19:51:03
Imre Tuske
FX Supervisor | Senior FXTD @ Weta FX

qLib -- Houdini asset library
http://qlab.github.io/qLib/ [qlab.github.io]
https://www.facebook.com/qLibHoudini [www.facebook.com]
User Avatar
Staff
586 posts
Joined: May 2014
Offline
There's not a simple way to do that at the moment, but I'm currently working on improving the existing caching mechanism so it's more configurable and interacts better with manual file deletion. I'll be posting a new thread specifically on this topic when the changes are available in a daily build, which should be in the next few days. There will be a collection of changes which will likely be guarded by an opt-in environment variable initially, in order make sure everyone is happy with them and that they don't break any existing work flows.

As a summary, the changes will try to address a few issues with the existing caching mechanism:

  • One of the problems that's been raised in the past is that if you have a chain of ROP TOPs, and manually delete outputs on disk from the first one, then ROP TOPs downstream will still cook from cache when they're in Automatic mode. In other words, the cache miss that occurs in the first node's work items isn't picked up by downstream items. That will be fixed with the upcoming changes.
  • Along with that, there is a new API call to manually invalidate the cache of a work item, which would also affect dependents. With that new feature one could, in theory, use a Python Script node that does custom logic to invalid the cache of all items downstream of that script work item.
  • Finally, PDG has actually always had a custom hook internally for cache handling based on file type/path. PDG defines logic for Mantra check point files, for example, to avoid incorrectly treating an image as a cache hit when a checkpoint file also exists for that image. The hook is just not exposed in the API right now, but we're going to be doing that as well.

Based on what you mentioned in your post it probably also make sense to extend PDG's built-in file validation routine for other known types of files. For example for .bgeo.sc files, we could use Houdini's geometry library to efficiently check if the file found on disk contains valid geometry. That should handle your 0-byte case out of the box.
User Avatar
Member
1694 posts
Joined: March 2020
Offline
Thanks for the detailed answer Taylor! It's good to know that these improvements are underway.

tpetrick
there is a new API call to manually invalidate the cache of a work item, which would also affect dependents. With that new feature one could, in theory, use a Python Script node that does custom logic to invalid the cache of all items downstream of that script work item.

Question: how would that look like in the case of a ROP Geometry TOP? Quite often that node generates its own workitems. How would I specify a ROP Geometry TOP not to rerun certain file(s) beforehand (especially if they're part of a batch)? That node “feels” a bit black-boxy that way. It also implements various caching logics that would be cumbersome to reimplement in an own version (e.g. the various batching options).

Also: to give you a bit more on the “big picture” of what I'm trying to achieve –

I'd like to be able to open up a hip file with a PDG graph in it, and rerun or “reconstruct” a certain graph state by running the graph, and I'd like none of the stuff that's already on disk be recomputed. Also, I'd like this process to be relatively fast (as in, compareable to loading back the graph state. Within reason, of course, as this depends on the graph – but this shouldn't take more than a minute or two).

It's true one can save/load the graph state, but that's not exactly something you can blindly trust (e.g. maybe some generated files are gone missing in the meantime by an automatic cleanup process, etc).


tpetrick
Based on what you mentioned in your post it probably also make sense to extend PDG's built-in file validation routine for other known types of files

Yep that'd be great, too. However it'd also be great for the user to be able to specify what kind of granularity of checking is desired (e.g. choosing between a simple filesize-based checking, or doing a more elaborate integrity check on file contents.)

Also, in our particular case, our allocation scheme thing is not file type specific, the same allocation thing is being used for bgeos, vdbs, etc. So, that's a case where I'd go for something more general-purpose than a filetype-based check.

(Would it be possible to do something like having one ROP Geo to just generate the workitems, having another that would just run out the workitems (handling batching accordingly, etc), and so I could put some python file-checking TOP node in the middle? This is me just thinking out loud, I'm not claiming to be a PDG expert)
Edited by riviera - April 13, 2020 02:24:59
Imre Tuske
FX Supervisor | Senior FXTD @ Weta FX

qLib -- Houdini asset library
http://qlab.github.io/qLib/ [qlab.github.io]
https://www.facebook.com/qLibHoudini [www.facebook.com]
  • Quick Links