One option that would work out-of-the-box is to write all of the files you want to discard to a different directory, and have the Remove File node remove that directory. PDG already creates as session-specific temporary directory which can be accessed with the $PDG_TEMP variable - this is where log files and scripts created by PDG are written, among other things. Scheduler nodes have a right mouse button entry for deleting that directory manually, but $PDG_TEMP can also be used as the path in a Remove File node. You could of course use your own though, e.g. something like $HIP/tempfiles/…
Alternatively, something like the following would also work:
I have two wait for all nodes - the first one waits for the work items that I want to keep, and the second one has the nodes that produced intermediate files that should be deleted. The attribute delete after the first wait for all is there to delete the Output attribute. Then, there's a third wait for all which depends on both branches, and a Remove File configured to delete all of the files on the work item in the third wait for all. This setup ensures that the delete step isn't run until after everything else is cooked, and the delete step will also not touch any of the files from the left branch.
There are pros and cons to both approaches. If your work items produce files that aren't reported to PDG, the Remove File node will never find out about them so they won't be cleaned up. In that case using a pre-determined temporary directory is more reliable since the clean up step will always remove the entire directory. The second approach gives you more explicit control and makes it easier to move nodes between the “keep” and “discard” lists, but will end up with a lot of node wires in larger graphs. The second approach can also be bundled into an asset more easily, and maybe should be a “Cleanup” TOP node that we ship with Houdini.