Drew Whitehouse

drew

About Me

Expertise
Not Specified
Location
Australia
Website

SciViz programmer, have been a SideFX user since Prisms. Minor claim to fame: I wrote the original Houdini Ocean Toolkit which has now been ported to pretty much every 3D platform under the sun. Thankfully Sidefx have released much better Ocean tools for Houdini since then :-)

Connect

Recent Forum Posts

PDG + Hqueue March 22, 2019, 12:15 a.m.

I had a chat with the network admins and they're loath to do this, the firewalls are there for security purposes. So now I'm thinking a way around it is to run a VPN on the farms cloud nodes and have the workstations sit on that VPN. It's going to be complicated.

This is what I'm seeing running on the farm node.

[hquser@worker-large-16cpu-centos7-1 ~]$ ps -ef | grep hq
hquser    2355     1  1 Feb26 ?        05:55:09 hserver
root     12383 12369  0 13:28 pts/0    00:00:00 sudo -i -u hquser
hquser   12384 12383  0 13:28 pts/0    00:00:00 -bash
hquser   12409 29869  0 13:29 ?        00:00:00 /bin/bash -c python -c "import xmlrpclib;s = xmlrpclib.ServerProxy('http://150.203.248.126:61034');s.start_cook('ropgeometry1_ropfetch1_1_9', '$JOBID');" && export HFS="$HQROOT/houdini_distros/hfs.$HQCLIENTARCH" && cd $HFS && source ./houdini_setup && "$HFS/bin/hython" "$HQROOT//g/data/z03/drw900/tmp/PDG/pdgtemp/37925/scripts/rop.py" -p "$HQROOT//g/data/z03/drw900/tmp/PDG/untitled.hip" -n "/obj/topnet1/ropgeometry1/ropnet1/geometry1" -to "/obj/topnet1/ropgeometry1" -i "ropgeometry1_ropfetch1_1_9" -s "150.203.248.126:61034" -fs 1 -fe 1 -fi 1
hquser   12412 12409  1 13:29 ?        00:00:00 /local/hquser/hqclient/./bin/python2.7-bin -c import xmlrpclib;s = xmlrpclib.ServerProxy('http://150.203.248.126:61034');s.start_cook('ropgeometry1_ropfetch1_1_9', '789');
hquser   12419 12384  0 13:29 pts/0    00:00:00 ps -ef
hquser   12420 12384  0 13:29 pts/0    00:00:00 grep --color=auto hq
hquser   29869   906  0 Mar15 ?        00:19:29 ./bin/python2.7-bin hqnode.py

BTW I note that there is potential problem here as well besides the network issue. The hip file, sitting on the same mounted file system /g/data/z03 is not strictly under

hqserver.sharedNetwork.path.linux = /g/data/z03/hqueue

which is where houdini_distros/hfs.linux-x86_64 etc lives.

This works fine for hqueue rendering which isn't adding the $HQROOT in front of the absolute path “/g/data/z03/drw900/tmp/PDG/untitled.hip”.

seelan
The HQueue scheduler parm interface allows to set custom callback port ranges. Any possibility of setting up a custom range, then opening up just those ports through the firewall? Or if you can at least do a test with firewall off, then perhaps with allowing just those ports, that way we can confirm the firewall is the problem.

PDG + Hqueue March 21, 2019, 9:54 p.m.

Hi all, reposting this one from the main forum…

I'm just beginning to play with the hqueuescheduler in TOPS and having no luck getting things going. A ropgeometry1_ropfetch* jobs seem to run but make no forward progress, hqserver.log isn't giving me much help. I'm wondering what ports need to be open for this stuff as we run hqueue in a fairly locked down environment. The hqueue installation is running fine for normal render jobs.

More investigation would suggest that the PDG processes being spawned by hqueue on the renderfarm are trying to connect back (via xmlrpc) to the originating workstation? Unfortunately our renderfarm and workstations networks are firewalled off. So it looks like a no go for the time being, or maybe some convoluted tunneling setups.

Wondering if this is going to be a limitation for all job schedulers, I'm interested in implementing a SLURM scheduler here. How many big installations allow renderfarm nodes to see the artist workstations and vice versa?

PDG + Hqueue installation March 17, 2019, 8:50 p.m.

Following up to my own post, more investigation would suggest that the PDG processes being spawned by hqueue on the renderfarm are trying to connect back (via xmlrpc) to the originating workstation and unfortunately our renderfarm and workstations networks are firewalled off. So it looks like a no go for the time being, or maybe some convoluted tunneling setups.