LOOKING FOR HELP ->CookError: Failed to connect to MQ server

   1908   2   1
User Avatar
Member
7 posts
Joined: Nov. 2010
Offline
I've been working on this for a few days and I'm having some issues getting Deadline9 and Houdini working together.

PLATFORM:
Houdini Version: 17.5.391
Deadline Version 9.0.12.0
OS Version: Windows 10

CONTEXT:
I'm creating a PDG network and with the local scheduler, it will run and write the ROPgeometry output files to the shared network location.
When I try to use the deadline scheduler it pushes the job to deadline, but the task fails when trying to initialize the PDGMQ server.
I'm stuck at this point now and now sure how to proceed debugging the problem.


The following is the output (paths and ip's redacted) from the Houdini console:

13:32:51 onStopCook()
13:32:51 Stopping shared servers
13:32:51 No root job running in onStopCook
13:32:51 Stopping MQ Relay server
13:32:51 MQ Relay stopped
13:32:51 onStopCook()
13:32:51 Stopping shared servers
13:32:51 No root job running in onStopCook
13:32:51 Stopping MQ Relay server
13:32:51 MQ Relay stopped
13:32:51 onStopCook()
13:32:51 Stopping shared servers
13:32:51 No root job running in onStopCook
13:32:51 Stopping MQ Relay server
13:32:51 MQ Relay stopped
13:32:51 OnStartCook()
13:32:51 PDGMQ as separate task: False
13:32:51 Local Working Dir: xxxxxxxx/research/shots/PDG_Tests
Remote Working Dir: xxxxxxxx/research/shots/PDG_Tests
13:32:51 Starting deadline command process
13:32:51 onSchedule: ropgeometry1_ropfetch10_34 - 0
13:32:51 Starting root job with PDGMQ server as monitor program
13:32:51 Task 0 file: xxxxxxxx/research/shots/PDG_Tests/pdgtemp/4480/job_31362ab69de2428690f00d54a32aca4d/task_0.txt
13:32:51 Not copying plugins folder because it exists!
13:32:51 Setting job directory: xxxxxxxx/research/shots/PDG_Tests/pdgtemp/4480/job_31362ab69de2428690f00d54a32aca4d
13:32:51 Job file=xxxxxxxx/research/shots/PDG_Tests/pdgtemp/4480/job_31362ab69de2428690f00d54a32aca4d/pdg_dl_job.txt
Plugin file=xxxxxxxx/research/shots/PDG_Tests/pdgtemp/4480/job_31362ab69de2428690f00d54a32aca4d/pdg_dl_plugin.txt
13:32:53 Root job: 5e1cb7d5f160b13294bea07c
13:33:13 Got PDGMQ server info: PDG_MQ XXX.XXX.XXX.XXX 57502 57503
## Message Queue Server Running
13:33:13 Starting MQ Relay server
Traceback (most recent call last):
File “C:/PROGRA~1/SIDEEF~1/HOUDIN~1.391/houdini/pdg/types\schedulers\tbdeadline.py”, line 786, in onSchedule
self._waitStartRelayServer(self.rootjob_id, local_conn_file)
File “C:/PROGRA~1/SIDEEF~1/HOUDIN~1.391/houdini/pdg/types\schedulers\tbdeadline.py”, line 1725, in _waitStartRelayServer
raise CookError(fail_msg)
CookError: Failed to connect to MQ server at XXX.XXX.XXX.XXX with error: “Failed to connect to PDGMQ: Timed out”!
Enable “Use IP Address for PDGMQ” in case of DNS issues.
13:33:17 onStopCook()
13:33:17 Cancelling tick timer
13:33:17 Stopping shared servers
13:33:17 Failing root job
13:33:17 Submitting FailJob: 5e1cb7d5f160b13294bea07c
13:33:17 Root job has been stopped
13:33:17 Setting timeout for MQ server monitor program
13:33:17 Stopping MQ Relay server
13:33:34 MQ Relay stopped
13:33:34 Stopping deadline command process


Anyone have any ideas?
User Avatar
Member
571 posts
Joined: May 2017
Offline
The error says that the local machine is failing to connect to the MQ server running on the Deadline worker machine at port 57503. This is usually an issue with a firewall between the 2 machines, so if you are using a firewall, try opening one of the available ports for TCP communication. Then on the TOP Deadline scheduler node, enable the Relay Port option under Message Queue parm section, and set the value to the port you opened.

There might also be a DNS issue, in which case, you can try enabling ‘Use IP Address for PDGMQ’ option under Deadline parms section on the TOP Deadline node
Edited by seelan - Jan. 14, 2020 09:20:24
User Avatar
Member
7 posts
Joined: Nov. 2010
Offline
Hey Seelan,

Thank you for your quick and helpful response. It was indeed a windows firewall issue blocking the server communication. Thank you for your help and pin-pointing the issue

~d
  • Quick Links