'Submit Graph As Job' generates Deadline Job, but just hangs

   3082   5   0
User Avatar
Member
7 posts
Joined: 11月 2010
Offline


The ‘Submit Graph As Job’ in the Deadline Scheduler TOP-Node generates the deadline job, but then the task just hangs at ‘Rendering’ with no logging being submitted through the Houdini Console or in Deadline. The jobs then need to be failed through deadline to free up the node for rendering other tasks.
The other approach of cooking the nodes through the NetworkView works perfectly fine. Any ideas on how to debug this?

Best,
~d
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
Usually hanging at Rendering could mean that its not finding a farm machine, or waiting on something in Deadline which hasn't been setup properly.

Turn on Verbose Logging on the TOP Deadline scheduler node and paste the output here (it will show up in the Houdini console). You can also check the farm machine deadline log as well as the job output from Deadline Monitor (after cancelling).
Edited by seelan - 2020年2月3日 20:55:43
User Avatar
Member
201 posts
Joined: 7月 2015
Offline
My guess would be that you are using DL on a single machine.
You need at least two slaves for the “Submit Graph as Job” feature.
You can launch two slaves on one machine. You can find the needed info in the DL Docs.
Manuel Köster - Senior Technical Artist @Remedy Entertainment

https://www.remedygames.com/ [www.remedygames.com]
http://shadesoforange.de/ [shadesoforange.de]
https://twitter.com/ShadesOfOrange_ [twitter.com]
User Avatar
Member
7 posts
Joined: 11月 2010
Offline
So from the Houdini logger I get this output:

11:46:00 Local Working Dir: XXXXXXXX/houdini/hip
Remote Working Dir: XXXXXXXX/houdini/hip
11:46:00 Not copying plugins folder because it exists!
11:46:00 Setting job directory: XXXXXXXX/houdini/hip/pdgtemp/13396/job_9e15196177df44b4bb21f89a367d7bc5
11:46:00 Job file=XXXXXXXX/houdini/hip/pdgtemp/13396/job_9e15196177df44b4bb21f89a367d7bc5/pdg_dl_job.txt
Plugin file=XXXXXXXX/houdini/hip/pdgtemp/13396/job_9e15196177df44b4bb21f89a367d7bc5/pdg_dl_plugin.txt
11:46:00 Task 0 file: XXXXXXXX/houdini/hip/pdgtemp/13396/job_9e15196177df44b4bb21f89a367d7bc5/task_0.txt
11:46:00 Submit As Job: Deadline command =


and from deadline:

=======================================================
Log
=======================================================
2020-02-03 18:36:49: 0: Plugin will be reloaded because a new job has been loaded.
2020-02-03 18:36:49: 0: Loading Job's Plugin timeout is Disabled
2020-02-03 18:36:49: 0: Loaded plugin PDGDeadline
2020-02-03 18:36:50: 0: Executing plugin command of type ‘Sync Files for Job’
2020-02-03 18:36:50: 0: All job files are already synchronized
2020-02-03 18:36:50: 0: Synchronizing Plugin PDGDeadline from XXXXXXXX/houdini/hip/pdgtemp/33200/plugins\PDGDeadline took: 0 seconds
2020-02-03 18:36:50: 0: Done executing plugin command of type ‘Sync Files for Job’
2020-02-03 18:36:50: 0: Executing plugin command of type ‘Initialize Plugin’
2020-02-03 18:36:50: 0: INFO: Executing plugin script ‘XXXXXXXX\plugins\5e38ae643808b472ec59334b\PDGDeadline.py’
2020-02-03 18:36:50: 0: INFO: *********** PDGDeadline InitializeProcess
2020-02-03 18:36:50: 0: INFO: About: PDG Plugin for Deadline
2020-02-03 18:36:50: 0: INFO: Render Job As User disabled, running as current user ‘autobot’
2020-02-03 18:36:50: 0: INFO: The job's environment will be merged with the current environment before rendering
2020-02-03 18:36:50: 0: Done executing plugin command of type ‘Initialize Plugin’
2020-02-03 18:36:50: 0: Start Job timeout is disabled.
2020-02-03 18:36:50: 0: Task timeout is disabled.
2020-02-03 18:36:50: 0: Loaded job: PDG FX_TEST (5e38ae643808b472ec59334b)
2020-02-03 18:36:50: 0: Skipping XXX: because it is already mapped
2020-02-03 18:36:50: 0: Skipping XXX: because it is already mapped
2020-02-03 18:36:50: 0: Skipping XXX: because it is already mapped
2020-02-03 18:36:50: 0: Skipping XXX: because it is already mapped
2020-02-03 18:36:50: 0: Skipping XXX: because it is already mapped
2020-02-03 18:36:50: 0: Skipping XXX: because it is already mapped
2020-02-03 18:36:50: 0: Executing plugin command of type ‘Start Job’
2020-02-03 18:36:50: 0: Done executing plugin command of type ‘Start Job’
2020-02-03 18:36:50: 0: Plugin rendering frame(s): 0
2020-02-03 18:36:51: 0: Executing plugin command of type ‘Render Task’
2020-02-03 18:36:51: 0: INFO: StartFrame: 0
2020-02-03 18:36:51: 0: INFO: Startup Directory: XXXXXXXX/houdini/hip/pdgtemp/33200
2020-02-03 18:36:51: 0: INFO: Looking for task file: XXXXXXXX/houdini/hip/pdgtemp/33200/job_a6b49bea5078428e8b131ba472b4baea\task_0.txt
2020-02-03 18:36:51: 0: INFO: Starting monitor program: C:\Python27\python.exe XXXXXXXX/houdini/hip/pdgtemp/33200/scripts/pdgmq.py –xmlport NNNNN –relayport MMMMM –connectionfile XXXXXXXX/houdini/hip/pdgtemp/33200/job_a6b49bea5078428e8b131ba472b4baea/pdgmq_server.txt –start
2020-02-03 18:36:51: 0: INFO: Startup Directory: XXXXXXXX/houdini/hip/pdgtemp/33200
2020-02-03 18:36:51: 0: INFO: Starting monitored program: PDGMonitor
2020-02-03 18:36:51: 0: INFO: Program executable: C:\Python27\python.exe
2020-02-03 18:36:51: 0: INFO: Program arguments: XXXXXXXX/houdini/hip/pdgtemp/33200/scripts/pdgmq.py –xmlport NNNNN –relayport MMMMM –connectionfile XXXXXXXX/houdini/hip/pdgtemp/33200/job_a6b49bea5078428e8b131ba472b4baea/pdgmq_server.txt –start
2020-02-03 18:36:51: 0: PDG_MQ NM32.marz.vfx NNNNN MMMMM
2020-02-03 18:36:51: 0: ## Message Queue Server Running
2020-02-03 18:36:51: 0: INFO: Got MQ server info: PDG_MQ NM32.marz.vfx NNNNN MMMMM
2020-02-03 18:36:51: 0: ## Message Queue Server Running
2020-02-03 18:36:51: 0: INFO: Setting PATH: C:\Python27\Lib\site-packages\PyQt4;XXXXXXXX;C:\Program Files\Thinkbox\Deadline9\bin;C:\Users\autobot\AppData\Local\Microsoft\WindowsApps;C:/PROGRA~1/SIDEEF~1/HOUDIN~1.348/bin
2020-02-03 18:36:51: 0: INFO: $HYTHON mapped to: C:/PROGRA~1/SIDEEF~1/HOUDIN~1.348/bin/hython.exe
2020-02-03 18:36:51: 0: INFO: Task Executable: C:/PROGRA~1/SIDEEF~1/HOUDIN~1.348/bin/hython.exe
2020-02-03 18:36:51: 0: INFO: Task Arguments: XXXXXXXX/houdini/hip/pdgtemp/33200/scripts/rop.py –batch -p XXXXXXXX/houdini/hip/FX_TEST.hip -n /obj/topnet1/ropgeometry2/ropnet1/geometry1 -to /obj/topnet1/ropgeometry2 -i ropgeometry2_ropfetch10_3 -fs 1032 -fe 1037 -fi 1
2020-02-03 18:36:51: 0: INFO: Invoking: Run Process
2020-02-03 18:36:53: 0: ## Connection from ('XXX.XXX.XXX.XXX', XXXXX)
2020-02-03 18:36:53: 0: ## Relay Mode Active
2020-02-03 18:37:01: 0: 0.0 mps, avg=0.0ms, max=0.0ms
2020-02-03 18:37:02: 0: STDOUT: PDG: Loading .hip file XXXXXXXX/houdini/hip/FX_TEST.hip.
2020-02-03 18:37:04: 0: STDOUT: PDG: Setting batch sub index to 0
2020-02-03 18:37:04: 0: STDOUT: PDG_START: ropgeometry2_ropfetch10_3;0
2020-02-03 18:37:04: 0: STDOUT: ALF_PROGRESS 0%
2020-02-03 18:37:04: 0: STDOUT: PDG: Setting batch sub index to 1
2020-02-03 18:37:04: 0: STDOUT: PDG_START: ropgeometry2_ropfetch10_3;1
2020-02-03 18:37:07: 0: STDOUT: ALF_PROGRESS 20%
2020-02-03 18:37:07: 0: STDOUT: PDG_RESULT: ropgeometry2_ropfetch10_3;0;'/ROP/bgeo/dust_pipeline/v001/dust_pipeline.1032.bgeosc';;0
2020-02-03 18:37:07: 0: STDOUT: PDG_SUCCESS: ropgeometry2_ropfetch10_3;0;0.0
2020-02-03 18:37:07: 0: STDOUT: PDG: Setting batch sub index to 2
2020-02-03 18:37:07: 0: STDOUT: PDG_START: ropgeometry2_ropfetch10_3;2
2020-02-03 18:37:10: 0: STDOUT: ALF_PROGRESS 40%
2020-02-03 18:37:10: 0: STDOUT: PDG_RESULT: ropgeometry2_ropfetch10_3;1;'/ROP/bgeo/dust_pipeline/v001/dust_pipeline.1033.bgeosc';;0
2020-02-03 18:37:10: 0: STDOUT: PDG_SUCCESS: ropgeometry2_ropfetch10_3;1;0.0
2020-02-03 18:37:10: 0: STDOUT: PDG: Setting batch sub index to 3
2020-02-03 18:37:10: 0: STDOUT: PDG_START: ropgeometry2_ropfetch10_3;3
2020-02-03 18:37:11: 0: 0.8 mps, avg=2.4ms, max=2.9ms
2020-02-03 18:37:17: 0: STDOUT: ALF_PROGRESS 60%
2020-02-03 18:37:17: 0: STDOUT: PDG_RESULT: ropgeometry2_ropfetch10_3;2;'/ROP/bgeo/dust_pipeline/v001/dust_pipeline.1034.bgeosc';;0
2020-02-03 18:37:17: 0: STDOUT: PDG_SUCCESS: ropgeometry2_ropfetch10_3;2;0.0
2020-02-03 18:37:17: 0: STDOUT: PDG: Setting batch sub index to 4
2020-02-03 18:37:17: 0: STDOUT: PDG_START: ropgeometry2_ropfetch10_3;4
2020-02-03 18:37:21: 0: 0.3 mps, avg=1.8ms, max=2.0ms
2020-02-03 18:37:27: 0: STDOUT: ALF_PROGRESS 80%
2020-02-03 18:37:27: 0: STDOUT: PDG_RESULT: ropgeometry2_ropfetch10_3;3;'/ROP/bgeo/dust_pipeline/v001/dust_pipeline.1035.bgeosc';;0
2020-02-03 18:37:27: 0: STDOUT: PDG_SUCCESS: ropgeometry2_ropfetch10_3;3;0.0
2020-02-03 18:37:27: 0: STDOUT: PDG: Setting batch sub index to 5
2020-02-03 18:37:27: 0: STDOUT: PDG_START: ropgeometry2_ropfetch10_3;5
2020-02-03 18:37:31: 0: 0.3 mps, avg=1.3ms, max=1.7ms
2020-02-03 18:37:40: 0: STDOUT: ALF_PROGRESS 100%
2020-02-03 18:37:40: 0: STDOUT: PDG_RESULT: ropgeometry2_ropfetch10_3;4;'/ROP/bgeo/dust_pipeline/v001/dust_pipeline.1036.bgeosc';;0
2020-02-03 18:37:40: 0: STDOUT: PDG_SUCCESS: ropgeometry2_ropfetch10_3;4;0.0
2020-02-03 18:37:40: 0: STDOUT: PDG_RESULT: ropgeometry2_ropfetch10_3;5;'/ROP/bgeo/dust_pipeline/v001/dust_pipeline.1037.bgeosc';;0
2020-02-03 18:37:40: 0: STDOUT: PDG_SUCCESS: ropgeometry2_ropfetch10_3;5;0.0
2020-02-03 18:37:41: 0: Done executing plugin command of type ‘Render Task’
2020-02-03 18:37:41: 0: Render time for frame(s): 50.466 s
2020-02-03 18:37:41: 0: Total time for task: 53.616 s
2020-02-03 18:37:41: 0: 0.4 mps, avg=2.4ms, max=3.1ms
2020-02-03 18:37:42: 0: Saving task log…

=======================================================
Details
=======================================================
Date: 02/03/2020 18:37:43
Frames: 0
Job Submit Date: 02/03/2020 18:36:04
Job User: ybessaa
Average RAM Usage: 592757312 (2%)
Peak RAM Usage: 1882791936 (6%)
Average CPU Usage: 36%
Peak CPU Usage: 107%
Used CPU Clocks (x10^6 cycles): 478523
Total CPU Clocks (x10^6 cycles): 1329229

=======================================================
Slave Information
=======================================================
Slave Name: FOO
Version: v9.0.12.0 Release (d3cbd3bab)
Operating System: Windows 10 Pro
Running As Service: Yes
Machine User: autobot
IP Address: XXX.XXX.XXX.XXX
MAC Address: XX:XX:XX:XX:XX:XX
CPU Architecture: x64
CPUs: 8
CPU Usage: 100%
Memory Usage: 7.1 GB / 31.9 GB (22%)
Free Disk Space: 453.704 GB
Video Card: NVIDIA GeForce GTX 980 Ti



It says that it's completing, but I've had one task running overnight with no success. it just seems to hang and not process anything.

~d
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
Nothing obvious stands out in the log you pasted. Please describe your setup, and post your hip file if you are able to do so.

How many farm machines are available to pick up this job? Are the same farm machines being used in the normal Houdini PDG cook (ie. not via Submit As Job)?
User Avatar
Member
7 posts
Joined: 11月 2010
Offline
Shadesoforange was right about the number or worker machines. The test pool I had was just a single worker and running it on a larger pool with more workers got it running. Thank you both for the suggestions and sharing your expertise.

~d
  • Quick Links