Issues getting deadline scheduler running

   3444   8   1
User Avatar
Member
390 posts
Joined: 1月 2012
Offline
Trying to get Deadline Scheduler running and am getting this exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "opdef:/Top/deadlinescheduler?PythonModule", line 3, in submitGraphAsJob
  File "/jpl/software/houdini/houdini_18.0.287/houdini/python2.7libs/pdg/scheduler.py", line 630, in submitGraphAsJob
    url = sch.submitAsJob(fname, net_name)
  File "/jpl/software/houdini/houdini_18.0.287/houdini/pdg/types/schedulers/tbdeadline.py", line 963, in submitAsJob
    raise CookError('Deadline submission error:\n' + str(e))
CookError: Deadline submission error:
[Errno 2] No such file or directory

I'm not sure what file it's looking for that it can't find. The graph cooks fine with the local scheduler. The job does not show up on deadline. DEADLINE_PATH is set to the deadline installation folder and deadlinecommand is in the path and runs fine.

Any ideas on how to troubleshoot this?
.
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
Submit Graph as Job should work. It sounds like its missing one or more of the job spec files, which get created in the Working Directory. Is that properly specified?

To debug, please turn on Verbose Logging in the Deadline section on the TOP Deadline node parm interface, and paste the log here. Or attach the hip file.

Does the regular cook (not Submit Graph as Job) work for you? To run the regular cook, go to Tasks > Dirty and Cook Output Node in the Network Editor.
User Avatar
Member
390 posts
Joined: 1月 2012
Offline
I've turned on Verbose Logging but don't see anything more being printed anywhere. Where does it log to?

A regular cook (dirty and cook) works fine, the graph is just a generic generator going into a python script that prints the hostname.

A pdgtemp directory gets created in the working directory I've set on the deadline scheduler with these folders and files inside:

.
├── job_a18fd66e3af1498c9fbea72a72be523a
│   ├── pdg_dl_job.txt
│   ├── pdg_dl_plugin.txt
│   └── task_0.txt
├── job_de7d3c04bf924cee9bcf231ac39566f0
│   ├── pdg_dl_job.txt
│   ├── pdg_dl_plugin.txt
│   └── task_0.txt
├── logs
├── plugins
│   └── PDGDeadline
│       ├── PDGDeadline.options
│       ├── PDGDeadline.param
│       ├── PDGDeadline.py
│       ├── PDGDeadline.pyc
│       ├── PDGDeadlineUtils.py
│       └── PDGDeadlineUtils.pyc
└── scripts
    ├── pdgcmd.py
    ├── pdgjobcmd.py
    └── top.py

I'm attaching the hip as well.
Edited by zdimaria - 2020年1月8日 13:25:01

Attachments:
bid_000__rnd_pdg_v001.hip (88.8 KB)

.
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
Looks like Deadline is not finding the job and plugin spec files in that working directory. You have it set to /jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg/. Is this correct? Any idea why it would not be able to find them?

You can try running the same command yourself in your shell (where deadlinecommand works):
deadlinecommand SubmitJob path/to/pdg_dl_job.txt path/to/pdg_dl_plugin.txt

For the verbose logging, looks like it gets enabled when you do a regular cook with Deadline first. Change the default scheduler in the TOP network to the TOP Deadline scheduler, do a regular cook, change it back to localscheduler, then do Submit Graph as Job. That should produce log output in your Houdini Console. This is a workaround for a bug with verbose logging which is fixed in next daily Houdini 18 and 17.5 builds.

Look for the line in the log that starts off like so: Submit As Job: Deadline command = ['SubmitJob', …
Make sure the paths are correct.
Edited by seelan - 2020年1月8日 22:15:12
User Avatar
Member
390 posts
Joined: 1月 2012
Offline
OK, so submitting it manually in a shell works fine. The verbose logging output this:

10:04:38 OnStartCook()
10:04:38 PDGMQ as separate task: False
10:04:38 Local Working Dir: /jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg
Remote Working Dir: /jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg
10:04:38 Starting deadline command process
10:07:49 Local Working Dir: /jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg
Remote Working Dir: /jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg
10:07:50 Setting job directory: /jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg/pdgtemp/31527/job_181a8d4712244eca908dfa5ea43775ca
10:07:50 Job file=/jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg/pdgtemp/31527/job_181a8d4712244eca908dfa5ea43775ca/pdg_dl_job.txt
Plugin file=/jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg/pdgtemp/31527/job_181a8d4712244eca908dfa5ea43775ca/pdg_dl_plugin.txt
10:07:50 Task 0 file: /jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg/pdgtemp/31527/job_181a8d4712244eca908dfa5ea43775ca/task_0.txt
10:07:50 Submit As Job: Deadline command = ['SubmitJob', u'/jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg/pdgtemp/31527/job_181a8d4712244eca908dfa5ea43775ca/pdg_dl_job.txt', u'/jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg/pdgtemp/31527/job_181a8d4712244eca908dfa5ea43775ca/pdg_dl_plugin.txt']

And running this command successfully submits the job:
deadlinecommand /jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg/pdgtemp/31527/job_181a8d4712244eca908dfa5ea43775ca/pdg_dl_job.txt /jpl/jobs/henlo/sequences/bid/shots/000/work/apps/houdini/pdg/pdgtemp/31527/job_181a8d4712244eca908dfa5ea43775ca/pdg_dl_plugin.txt


The python exception in houdini is still:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "opdef:/Top/deadlinescheduler?PythonModule", line 3, in submitGraphAsJob
  File "/jpl/software/houdini/houdini_18.0.287/houdini/python2.7libs/pdg/scheduler.py", line 630, in submitGraphAsJob
    url = sch.submitAsJob(fname, net_name)
  File "/jpl/software/houdini/houdini_18.0.287/houdini/pdg/types/schedulers/tbdeadline.py", line 963, in submitAsJob
    raise CookError('Deadline submission error:\n' + str(e))
CookError: Deadline submission error:
[Errno 2] No such file or directory
.
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
Is it possible that you might have different versions of Deadline installed? Looks like the “No such file or directory” error happens when the licensing check fails on Deadline.
https://docs.thinkboxsoftware.com/products/licensing/1.0/Licensing%20Guide/error96-2.html [docs.thinkboxsoftware.com]

It produces a different error if I specify incorrect job spec files, so that can't be it.

Make sure that the deadlinecommand that is invoked from Houdini is same as the one that works for you.
User Avatar
Member
390 posts
Joined: 1月 2012
Offline
Hi Seelan,

I believe I have found the issue. It was something in our houdini wrapper / launcher messing up DEADLINE_PATH.



thanks for your help!
Edited by zdimaria - 2020年1月9日 14:12:35
.
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
That error is coming from Deadline's command though so we don't have control over that really, but propagate it up to you. Perhaps you can suggest it to Thinkbox
User Avatar
Member
390 posts
Joined: 1月 2012
Offline
Haha yeah I realized that and removed the comment. I'll shoot them a message.

thanks again!
.
  • Quick Links