Search - User list
Full Version: tractor+pdg error on job finish with 'submit graph as job'
Root » PDG/TOPs » tractor+pdg error on job finish with 'submit graph as job'
mestela
Again looking to get farm jobs for sims and whatnot, the ‘submit graph as job’ works, can see the MQ server start, start other blades, but when complete the master job fails with this error:


Traceback (most recent call last):
  File "/mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdgtemp/8642/scripts/top.py", line 189, in <module>
    report_resultdata(pdg_node, args.report == 'all')
  File "/mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdgtemp/8642/scripts/top.py", line 97, in report_resultdata
    hash_code=hc)
  File "/mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdgtemp/8642/scripts/pdgcmd.py", line 214, in reportResultData
    item_name = os.environ['PDG_ITEM_NAME']
  File "/mnt/ala/software/ext_packages/python/2.7.15/platform-linux/arch-x86_64/os-RedHatEnterpriseWorkstation-7.4/miniconda2/lib/python2.7/UserDict.py", line 40, in __getitem__
    raise KeyError(key)
KeyError: 'PDG_ITEM_NAME'

Any ideas why this might be failing? If I look at the job logs at the top I can see some pdg vars being defined, but not PDG_ITEM_NAME:

====[2019/09/12 17:00:55 /J23534/T1/C1.8/139002@snail on mollie00 ]====

argv: [
  "/opt/hfs17.5.327/bin/hython", 
  "/mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdgtemp/8642/scripts/pdgtrcmd.py", 
  "--norpc", 
  "--setenv", 
  "PDG_TEMP", 
  "/mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdgtemp/8642", 
  "--setenv", 
  "PDG_RESULT_SERVER", 
  "%D()", 
  "--setenv", 
  "PDG_DIR", 
  "/mnt/ala/mav/2019/sandbox/studio3/Tech/matt", 
  "--setenv", 
  "PDG_SCRIPTDIR", 
  "/mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdgtemp/8642/scripts", 
  "--setenv", 
  "PDG_JOBID_VAR", 
  "TR_ENV_JID", 
  "/opt/hfs17.5.327/bin/hython$PDG_EXE", 
  "/mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdgtemp/8642/scripts/top.py", 
  "--report", 
  "none", 
  "--hip", 
  "/mnt/ala/mav/2019/sandbox/studio3/Tech/matt/farm_filecache_v07.hipnc", 
  "--toppath", 
  "/obj/topnet1"
]
chrisgreb
That's a bug in that version of 17.5. Is it possible to update ?
mestela
Sure, up to what version?
chrisgreb
.332 should be the oldest fixed build
mestela
Got 17.5.376 installed, now jobs fail at startup. The erroring line in the tractor logs:

/mnt/ala/software/ext_packages/houdini/17.5.376/install/bin/hython /mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdgtemp/40872/scripts/top.py --report none --hip /mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdg_tractor_v01.hipnc --toppath /obj/ala_fx_container1/topnet
/mnt/ala/software/ext_packages/houdini/17.5.376/install/bin/hython-bin: symbol lookup error: /mnt/ala/software/ext_packages/houdini/17.5.376/install/bin/../dsolib/libHoudiniUT.so: undefined symbol: _ZN3tbb19task_scheduler_init27internal_blocking_terminateEb

Clues? Will try installing the oldest version after .332 that's available, which is .360, which is not very old in the grand scheme of things.
chrisgreb
mestela
Got 17.5.376 installed, now jobs fail at startup.
Could there be a library mix up due to your system environment? What if you just run hython directly on that blade?
If it also crashes, you could try tracing what libraries are being loaded with:
strace -o crashtrace.txt -e trace=open /mnt/ala/software/ext_packages/houdini/17.5.376/install/bin/hython /mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdgtemp/40872/scripts/top.py --report none --hip /mnt/ala/mav/2019/sandbox/studio3/Tech/matt/pdg_tractor_v01.hipnc --toppath /obj/ala_fx_container1/topnet
mestela
Yeah I'm finding the combo of tractor, rez, pdg, shotgun, usd (we're using the compiled-from-source usd stuff and other bits) is a twisty mess. Pulling it apart, might take a while.
mestela
Actually this is sort of interesting, I assume its a path misconfig on my part, but the crashtrace is miles long. Implies we're lucky anything runs at all!

Just running ‘hython’ from the command line drops me to a python prompt, a few complaints about qt stuff, but nowhere near as bad as the pdg command nor stack trace.

Running ‘houdini’ is complaint free, runs fine. It's all a little confusing! Should be able to get some of our more tech-savvy staff involved this week to help debug.

Crashtrace attached.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Powered by DjangoBB