i was testing one of Matt Estella's examples the cube slicer solver
tried to do it in a for loop and then a compile block around that.
it seems the compile block is not contributing and it is even slower if the geo is heavy, instead of a cube a rubber toy or our beloved squab.
Looking at the performance monitor the compile block is the slowest to cook, while in the for loop example the clip nodes are the heavy ones.
Any idea why that is so?
Compile Block slower than For Loop
3375 5 2-
- papsphilip
- Member
- 388 posts
- Joined: July 2018
- Offline
-
- Enivob
- Member
- 2658 posts
- Joined: June 2008
- Offline
-
- tamte
- Member
- 9386 posts
- Joined: July 2007
- Offline
2 reasons:
1. I'd suggest wrapping both in a subnet and then look at the time next to subnet in Performance monitor
since otherwise while Compile End shows you the time it took for the whole block, the plain foreach will not and therefore you will jut see individual nodes that you'd have to sum up
for me compiled is slightly faster, even though it doesnt matter much because of 2.
2. the Multithreading of for loops can only happen if the Gather Method on block end is Merge Each Iteration, yours is Feedback Each iteration, so each iteration has to wait for previous one to finish to be fed back so they can't be run in parallel
you can still get some boost from compiling, but definitely not because of multithreaded for loop
1. I'd suggest wrapping both in a subnet and then look at the time next to subnet in Performance monitor
since otherwise while Compile End shows you the time it took for the whole block, the plain foreach will not and therefore you will jut see individual nodes that you'd have to sum up
for me compiled is slightly faster, even though it doesnt matter much because of 2.
2. the Multithreading of for loops can only happen if the Gather Method on block end is Merge Each Iteration, yours is Feedback Each iteration, so each iteration has to wait for previous one to finish to be fed back so they can't be run in parallel
you can still get some boost from compiling, but definitely not because of multithreaded for loop
Edited by tamte - Oct. 4, 2022 14:42:05
Tomas Slancik
CG Supervisor
Framestore, NY
CG Supervisor
Framestore, NY
-
- papsphilip
- Member
- 388 posts
- Joined: July 2018
- Offline
tamte
2. the Multithreading of for loops can only happen if the Gather Method on block end is Merge Each Iteration, yours is Feedback Each iteration, so each iteration has to wait for previous one to finish to be fed back so they can't be run in parallel
you can still get some boost from compiling, but definitely not because of multithreaded for loop
yes that makes perfect sense! thank you!
making it faster in this case would require a different approach since multithreading is not possible
-
- animatrix_
- Member
- 5100 posts
- Joined: Feb. 2012
- Offline
You are not going to get much performance by compiling a feedback loop. Your best bet is to consolidate the number of operations like performing 1 clip instead of 2, but as is it might be tricky.
An easy way to speed this up would be to process each piece in parallel using another for loop network at the top level inside the same compile network.
If you have a lot of patience, you could also implement the entire thing in VEX. In my implementation of Poly Carve SOP which is far more complex than what Clip SOP is doing, I got about 3x performance against Clip SOP:
https://forums.odforce.net/topic/44143-poly-carve-sop/?do=findComment&comment=232434 [forums.odforce.net]
So if all you want is to split a geometry in half, you could get much faster performance for this operation. I have to preserve all attributes and groups, as well as stitching up adjacent primitives, and many other operations, all of which has a performance cost.
An easy way to speed this up would be to process each piece in parallel using another for loop network at the top level inside the same compile network.
If you have a lot of patience, you could also implement the entire thing in VEX. In my implementation of Poly Carve SOP which is far more complex than what Clip SOP is doing, I got about 3x performance against Clip SOP:
https://forums.odforce.net/topic/44143-poly-carve-sop/?do=findComment&comment=232434 [forums.odforce.net]
So if all you want is to split a geometry in half, you could get much faster performance for this operation. I have to preserve all attributes and groups, as well as stitching up adjacent primitives, and many other operations, all of which has a performance cost.
Senior FX TD @ Industrial Light & Magic
Get to the NEXT level in Houdini & VEX with Pragmatic VEX! [www.pragmatic-vfx.com] https://lnk.bio/animatrix [lnk.bio]
Get to the NEXT level in Houdini & VEX with Pragmatic VEX! [www.pragmatic-vfx.com] https://lnk.bio/animatrix [lnk.bio]
-
- elovikov
- Member
- 151 posts
- Joined: June 2019
- Offline
For me compiled block is also slightly faster.
As tamte said the main reason here is feedback loop. It's just can't be run in parallel.
Compiled blocks are very powerful but also a bit "blackboxed" I'd say.
Afaik they are consisted of several ways of optimizations:
- parallelized run of compiled operation in loops (only if you're merging result)
- inplace operations on geometry (without copy of input). again you can't easily tell if operation supports inplace operation, for example ops without changing the topology could be applied inplace, but something like subdivide wouldn't
- parallelized cook of node inputs, if node has more than one. in your example in theory merge inputs can be parallelized. this one wasn't supported when compiled block was introduced, but reserved "for future". I have no idea if it's supported right now
- very specific use of OpenCL sop and trying to avoid copying data between cpu and gpu
All of this explained here: https://vimeo.com/222881605 [vimeo.com]
The main problem here is that even if you managed to fit nodes in compiled block you just don't know what exactly you get out of it. "Compilation" just tries to create an optimal task graph with all this optimizations but hides what it looks exactly. Most of the time you have to use intuition basically
Being so low level and technical nodes they are definitely lack debugging and detailed reporting functions.
As tamte said the main reason here is feedback loop. It's just can't be run in parallel.
Compiled blocks are very powerful but also a bit "blackboxed" I'd say.
Afaik they are consisted of several ways of optimizations:
- parallelized run of compiled operation in loops (only if you're merging result)
- inplace operations on geometry (without copy of input). again you can't easily tell if operation supports inplace operation, for example ops without changing the topology could be applied inplace, but something like subdivide wouldn't
- parallelized cook of node inputs, if node has more than one. in your example in theory merge inputs can be parallelized. this one wasn't supported when compiled block was introduced, but reserved "for future". I have no idea if it's supported right now
- very specific use of OpenCL sop and trying to avoid copying data between cpu and gpu
All of this explained here: https://vimeo.com/222881605 [vimeo.com]
The main problem here is that even if you managed to fit nodes in compiled block you just don't know what exactly you get out of it. "Compilation" just tries to create an optimal task graph with all this optimizations but hides what it looks exactly. Most of the time you have to use intuition basically

Being so low level and technical nodes they are definitely lack debugging and detailed reporting functions.
-
- Quick Links


