How-to combine a variable amount of geometryimport?

Forums PDG/TOPs How-to combine a variable amount of geometryimport?

4136 12 0


Andr: Member; 899 posts; Joined: Feb. 2016; Offline

June 20, 2019 12:45 p.m.

I have a digital asset outputting a variable amount of geometries.
I need to import the GEOs (points) into a topnet and procedurally combine them together, like you would append one to each other.
It has to be dynamic, there can be indeed a variable amount of ‘geo import nodes’.

I tried with a for loop inside the topnet with no luck.
My last resort would be to have the digital asset generate and append the ‘geo import nodes’ with python, but I'd like to avoid it.

Any help very appreciated!
Thanks

Edited by Andr - June 20, 2019 12:47:03

Attachments:
Q_variableImportPDG.hiplc (186.3 KB)
pdgCombo.JPG (150.3 KB)


kenxu: Member; 544 posts; Joined: Sept. 2012; Offline

June 20, 2019 4:54 p.m.

Sorry, it's still not quite clear to us what you're trying to achieve. Are you saying that you want to run a particular HDA a variable number of times (3 in the above case), then import the geometry of each run as points, then somehow combine the point attributes from the different runs?

- Ken Xu


Andr: Member; 899 posts; Joined: Feb. 2016; Offline

June 20, 2019 5:31 p.m.

Sorry for not being clear enough, probably the HDA part was a bit misleading.

Let's say I have 2 geos:
GEO1 (2 points) has a point attribute “A” with values (A0, A1)
GEO2 (3 points) has a point attribute “B” with values (B0, B1, B2)

When the GEOs are imported, the system should be able to produce 6 workitems with the attributes merged:
(A0, B0), (A0, B1), (A0, B2), (A1, B0), (A1, B1), (A1, B2)

I could do that if I manually append GEO2 after GEO1 in the topnet, but I'd like it to be automatic and suitable for N geometries. Every time there can be a different amount of geos to combine.

The GEOs are produced by in a SOP context. They are not actual files.

Edited by Andr - June 21, 2019 02:08:40


kenxu: Member; 544 posts; Joined: Sept. 2012; Offline

June 21, 2019 11:09 a.m.

Ah, ok. Here is one possible solution:

Step 1. Create a custom python processor to scan your SOP network and create 1 workitem per matching node. Attach the path of that node as an attribute, say for example “node_path”.

Step 2. Use a geometry import node to pull in the points using the above attribute. So `@node_path` . This pulls all 14 points in this case: 2 from OUT_0, 4 from OUT_1, and 8 from OUT_2. Each point is a separate workitem.

Step 3. Create a custom python partitioner to create partitions based on unique combinations of attributes A,B,C. We don't have an exact pre-created partitioner that does this exact thing (partition by combination comes close, but not quite), so has to be custom partitioner.

I've attached the solution file here. Hopefully that helps.

Edited by kenxu - June 21, 2019 11:09:46

Attachments:
pdg_forum_help_geo_import_multiple_nodes.png (131.4 KB)
Q_variableImportPDG_fixed.hiplc (242.2 KB)

- Ken Xu


Andr: Member; 899 posts; Joined: Feb. 2016; Offline

July 4, 2019 9:26 a.m.

Hello Kenxu, thanks a lot for providing this example and introducing me to more custom tops setups with the different python nodes.

In the partitioner I'm now using the itertools.product() function, instead of the nested for-loops, to do a more procedural cartesian product of the workitems, since the number of imported geos can vary every time.

cheers


Andr: Member; 899 posts; Joined: Feb. 2016; Offline

July 4, 2019 1:59 p.m.

I tried to use itertools.product() and while it produces the right amount of iterations, the merged attributes “A”, “B”, “C”, would all be set to values A0, B0, C0 (skipping the other values A1, B1, C1, etc).
1)Why is this happening?

2)Also I noticed that the python script (as shown in the image), even when there is no code in it, is stopping the partitioner from working. I need to do some operations on the workitems with the python script just before sending them to the partitioner

Any help much appreciated!
Thanks

Edited by Andr - July 4, 2019 14:00:11

Attachments:
Q_itertoolsbrokenBYpyscript.hiplc (2.5 MB)
brokenitertoool.JPG (180.2 KB)


Andr: Member; 899 posts; Joined: Feb. 2016; Offline

July 5, 2019 2:47 a.m.

Hello Kenxu,
Regarding question 1, I've found out that the same issue occurs with the nested for-loops solution when you feed the partitioner an already partitioned node.

That's why you did the partition by attribute already inside the partitioner?

I changed your partitioner code into the following and feed it a 3 partitions node:
Interestingly, I'm able to read the single work_item attribute values if I iterate over the .partitionItems and ask for their data, but when I assign them to new partitions in the nested for-loops something breaks and the attributes values are not being copied.
What logical understanding am I missing?

# Partition by Attribute
A_bucket = work_items[0].partitionItems
B_bucket = work_items[1].partitionItems
C_bucket = work_items[2].partitionItems

partition_count = 0
for wi_A in A_bucket:
  for wi_B in B_bucket:
    for wi_C in C_bucket:
      partition_holder.addItemToPartition(wi_A, partition_count)
      partition_holder.addItemToPartition(wi_B, partition_count)
      partition_holder.addItemToPartition(wi_C, partition_count)
      partition_count = partition_count + 1


Andr: Member; 899 posts; Joined: Feb. 2016; Offline

July 5, 2019 7:08 a.m.

So since the partitioner doesn't seem to like much to work with partitionItems, if I need to do the cartesian product of N partitions, the trick is to group the workitems into N arrays already inside the partitioner, like you did.

In my case the initial partitions were built with a “partition by attribute” node (distinct values).
I recreated it inside the python partitioner with following code, with few conditions:
1)“P_id” is the int attribute whose distinct values are being used to group the workitems.
2) Workitems are sorted through “P_id”
3) Values for “P_id” have no gaps (1, 2, 3, 4) is ok, (1,4,5) not ok

idcheck = 0
d = {}      

for w in work_items:
    id = w.data.intData("P_id", 0)
    if id != idcheck:
        idcheck = idcheck + 1
        if "bucket{0}".format(idcheck) not in d:
            d["bucket{0}".format(idcheck)] = [w]      
        else:
            d["bucket{0}".format(idcheck)].append(w)
        
    else:
        if "bucket{0}".format(idcheck) not in d:
            d["bucket{0}".format(idcheck)] = [w]           
        else:
            d["bucket{0}".format(idcheck)].append(w)

Now the cartesian product of these buckets using itertools produces partitions with no lost attribute values.
I still would like to know what's going wrong with work_item.partitionItems inside the python partitioner.

Edited by Andr - July 5, 2019 07:17:37

Attachments:
Q_attribVals_lost_fix.hiplc (176.2 KB)


Andr: Member; 899 posts; Joined: Feb. 2016; Offline

July 5, 2019 2:57 p.m.

Regarding question 2: Why a blank python script prevents the python partitioner to generate partitions?

I have produced a case example where the python script is also able to break a partition by attribute node.

Also I noticed the following:
1) python script re-order the workitems randomly. Workitems order will be different from the upstream node.
2)If you dirty and cook the python script, the order of the workitems will be the same as the last cook. But if you re-cook the immediate upstream node (in this case the merge2 node), the python script will be dirtied and when re-cooked the order of the workitems will be different from the previous cook

Please, have a look.

Edited by Andr - July 5, 2019 15:01:14

Attachments:
Q_pyscriptWeird.hiplc (187.2 KB)
pyscriptweird.JPG (199.3 KB)


kenxu: Member; 544 posts; Joined: Sept. 2012; Offline

July 5, 2019 4:12 p.m.

Hi Andr,

So there are couple of issues here. The first problem is caused by the default setting on the Python Script node. The automatic setting here is putting the node in “dynamic” mode by default, where it should have been more consistent and put it to “static”. We'll make the change on the default, but for now you could either fix that by explicitly setting “static” on the python script node, or else turn on the “dynamic partitioning” setting on the partitioner node if you elect to keep the python script node “dynamic”.

The second issue is caused by the fact that when we have multiple workitems with the same attribute name, then the partitioner will elect to keep only 1 of the attributes as its own. You can get to choose which one through the sorting options in the advanced tab of partitioners.

In the longer run, we plan to have a specific node for this case. you'd wire it after the partitioner, and it exposes a bunch of merging settings sort of like the attribute promote sop. So you'd partition, then put down the “attribute promote” node which handles the merging of data from the items in each partition.

For now, your solution works because it orders the workitems in a way so that the one with the right attributes arrives first. Another way to solve this problem, for now, is to put a python processor after this partitioner and iterate through the upstream_items, which are partitions, and access their partitionItems and do whatever is needed with the data.

Edited by kenxu - July 5, 2019 16:14:28

Attachments:
pdg_forum_help_python_script_static.png (21.4 KB)
pdg_forum_help_dynamic_partition.png (34.0 KB)

- Ken Xu


Andr: Member; 899 posts; Joined: Feb. 2016; Offline

July 6, 2019 6:42 a.m.

Hello Kenxu, very appreciated the effort you put into explaining!

Since my "procedural bucketing code [www.sidefx.com]" works only with an ordered set of distinct attributes values with no gaps (0,1,2,3,4), and I do have gaps in my real project (0,3,5, etc), it doesn't fit my needs.
Option 1) I could try to implement a more generic ‘partition by attribute value’ code, but seems a bit overkill effort, when there's a node that already does it! (-:

Option 2) Instead, I'm trying your suggestion now, I'm doing the post-partitioner work of retrieving the real values of the partionItems inside a python processor node and re-build the workitems with those values.
Option 2 works, but I'm asking myself if that's a not very efficient operation of re-building all the workitems in the python processor.
(I'll be working with thousands of workitems, and I'm aiming at good latency: the user is supposed to read those values updated as he changes the parameters in the HDA)
So maybe I should really develop option 1 and measure the performance difference. For now I'll stick with option 2.

Anyway just for your information, I noticed that the printed values of the partitionItems are weirdly formatted in the console if they are read from the python partitioner. This seems not happening with partitions created by a partition by attribute node.
Not a big deal, it's just a bit difficult to debug. I attached an example file reproducing the error.
Thanks again!

Cheers

Edited by Andr - July 6, 2019 06:48:00

Attachments:
Q_weirdPrint.hiplc (215.6 KB)
weirdconsole.JPG (25.9 KB)


kenxu: Member; 544 posts; Joined: Sept. 2012; Offline

July 8, 2019 1:50 p.m.

Hi Andr,

In the short run, the work-around option 2) is the best. In the medium term, we'll get the “attribute promote” node added so that this becomes standard functionality and you don't need to roll your own.

WRT the prints, it's expected, as PDG is a fundamentally asynchronous technology. Everything from the way it farms out work to the graph evaluation itself is asynchronous. On the upside is the ability to scale, but the downside is messy prints

- Ken Xu


Andr: Member; 899 posts; Joined: Feb. 2016; Offline

July 8, 2019 2:36 p.m.

roger that,
thanks!

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts