Search - User list
Full Version: PDG + Hqueue
Root » PDG/TOPs » PDG + Hqueue
drew
Hi all, reposting this one from the main forum…

I'm just beginning to play with the hqueuescheduler in TOPS and having no luck getting things going. A ropgeometry1_ropfetch* jobs seem to run but make no forward progress, hqserver.log isn't giving me much help. I'm wondering what ports need to be open for this stuff as we run hqueue in a fairly locked down environment. The hqueue installation is running fine for normal render jobs.

More investigation would suggest that the PDG processes being spawned by hqueue on the renderfarm are trying to connect back (via xmlrpc) to the originating workstation? Unfortunately our renderfarm and workstations networks are firewalled off. So it looks like a no go for the time being, or maybe some convoluted tunneling setups.

Wondering if this is going to be a limitation for all job schedulers, I'm interested in implementing a SLURM scheduler here. How many big installations allow renderfarm nodes to see the artist workstations and vice versa?
seelan
The HQueue scheduler parm interface allows to set custom callback port ranges. Any possibility of setting up a custom range, then opening up just those ports through the firewall? Or if you can at least do a test with firewall off, then perhaps with allowing just those ports, that way we can confirm the firewall is the problem.
drew
I had a chat with the network admins and they're loath to do this, the firewalls are there for security purposes. So now I'm thinking a way around it is to run a VPN on the farms cloud nodes and have the workstations sit on that VPN. It's going to be complicated.

This is what I'm seeing running on the farm node.

[hquser@worker-large-16cpu-centos7-1 ~]$ ps -ef | grep hq
hquser    2355     1  1 Feb26 ?        05:55:09 hserver
root     12383 12369  0 13:28 pts/0    00:00:00 sudo -i -u hquser
hquser   12384 12383  0 13:28 pts/0    00:00:00 -bash
hquser   12409 29869  0 13:29 ?        00:00:00 /bin/bash -c python -c "import xmlrpclib;s = xmlrpclib.ServerProxy('http://150.203.248.126:61034');s.start_cook('ropgeometry1_ropfetch1_1_9', '$JOBID');" && export HFS="$HQROOT/houdini_distros/hfs.$HQCLIENTARCH" && cd $HFS && source ./houdini_setup && "$HFS/bin/hython" "$HQROOT//g/data/z03/drw900/tmp/PDG/pdgtemp/37925/scripts/rop.py" -p "$HQROOT//g/data/z03/drw900/tmp/PDG/untitled.hip" -n "/obj/topnet1/ropgeometry1/ropnet1/geometry1" -to "/obj/topnet1/ropgeometry1" -i "ropgeometry1_ropfetch1_1_9" -s "150.203.248.126:61034" -fs 1 -fe 1 -fi 1
hquser   12412 12409  1 13:29 ?        00:00:00 /local/hquser/hqclient/./bin/python2.7-bin -c import xmlrpclib;s = xmlrpclib.ServerProxy('http://150.203.248.126:61034');s.start_cook('ropgeometry1_ropfetch1_1_9', '789');
hquser   12419 12384  0 13:29 pts/0    00:00:00 ps -ef
hquser   12420 12384  0 13:29 pts/0    00:00:00 grep --color=auto hq
hquser   29869   906  0 Mar15 ?        00:19:29 ./bin/python2.7-bin hqnode.py

BTW I note that there is potential problem here as well besides the network issue. The hip file, sitting on the same mounted file system /g/data/z03 is not strictly under

hqserver.sharedNetwork.path.linux = /g/data/z03/hqueue

which is where houdini_distros/hfs.linux-x86_64 etc lives.

This works fine for hqueue rendering which isn't adding the $HQROOT in front of the absolute path “/g/data/z03/drw900/tmp/PDG/untitled.hip”.

seelan
The HQueue scheduler parm interface allows to set custom callback port ranges. Any possibility of setting up a custom range, then opening up just those ports through the firewall? Or if you can at least do a test with firewall off, then perhaps with allowing just those ports, that way we can confirm the firewall is the problem.
GeordieM
This is unfortunate. I would have assumed all communication between workstation and clients would be proxied through the HQ Server since that's how normal rendering works.
chrisgreb
GeordieM
This is unfortunate. I would have assumed all communication between workstation and clients would be proxied through the HQ Server since that's how normal rendering works.

It's actually been since changed to work like that. All communication from jobs is routed through a message queue which runs on the farm. There should be no problems with firewalls or other restrictions.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Powered by DjangoBB