deadline "socket.gaierror: [Errno 11001] getaddrinfo failed"

   4846   2   0
User Avatar
Member
123 posts
Joined: May 2015
Offline
Either Errno 11001 or Errno 11004, actually.

Most of the latest additions and corrections to PDG seem to have taken care of the submission errors we were getting, as well as simplify the whole process. No need to point to the repository, HFS is correctly exploited directly from within Houdini, it's waaay better.

However, I'm now getting a weird error : upon submitting to deadline via PDG, the job somehow fails to, well, tell it's actually finished.
The result of a simple ROP correctly writes on the disk, but the job starts looping with an error, and Houdini in turn can't mark the tasks as complete, as it is still waiting on the deadline status report.

From what I understand from that error, Deadline can't resolve a server address, but, as a lowly noobie, I'm not sure what server it actually tries to access here. I'm pretty sure this is an issue with our network, but understanding the error could help us find a resolution.

Maybe it's just another variable to pass through from Houdini ?

Extract from the Deadline log :

2019-04-18 16:46:29:  0: STDOUT: Loading .hip file VPATH/PDG_deadline_test/PDG_deadline_test.hip.
2019-04-18 16:46:31:  0: STDOUT: Traceback (most recent call last):
2019-04-18 16:46:31:  0: STDOUT: PDG_RESULT: ropgeometry1_ropfetch1_3;-1;'__PDG_DIR__/geo/PDG_deadline_test.ropgeometry1.3.bgeo.sc';;0
2019-04-18 16:46:31:  0: STDOUT:   File "V:/PATH/PDG_deadline_test/pdgtemp/40756/scripts/rop.py", line 502, in <module>
2019-04-18 16:46:31:  0: STDOUT:     cooker.cookSingleFrame(args)
2019-04-18 16:46:31:  0: STDOUT:   File "V:/PATH/PDG_deadline_test/pdgtemp/40756/scripts/rop.py", line 203, in cookSingleFrame
2019-04-18 16:46:31:  0: STDOUT:     reportResultData(parm.evalAtFrame(args.start), server_addr=args.server)
2019-04-18 16:46:31:  0: STDOUT:   File "V:\PATH\PDG_deadline_test\pdgtemp\40756\scripts\pdgcmd.py", line 222, in reportResultData
2019-04-18 16:46:31:  0: STDOUT:     result_data_tag, hash_code, jobid)
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\xmlrpclib.py", line 1243, in __call__
2019-04-18 16:46:31:  0: STDOUT:     return self.__send(self.__name, args)
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\xmlrpclib.py", line 1602, in __request
2019-04-18 16:46:31:  0: STDOUT:     verbose=self.__verbose
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\xmlrpclib.py", line 1283, in request
2019-04-18 16:46:31:  0: STDOUT:     return self.single_request(host, handler, request_body, verbose)
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\xmlrpclib.py", line 1311, in single_request
2019-04-18 16:46:31:  0: STDOUT:     self.send_content(h, request_body)
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\xmlrpclib.py", line 1459, in send_content
2019-04-18 16:46:31:  0: STDOUT:     connection.endheaders(request_body)
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\httplib.py", line 1038, in endheaders
2019-04-18 16:46:31:  0: STDOUT:     self._send_output(message_body)
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\httplib.py", line 882, in _send_output
2019-04-18 16:46:31:  0: STDOUT:     self.send(msg)
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\httplib.py", line 844, in send
2019-04-18 16:46:31:  0: STDOUT:     self.connect()
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\httplib.py", line 821, in connect
2019-04-18 16:46:31:  0: STDOUT:     self.timeout, self.source_address)
2019-04-18 16:46:31:  0: STDOUT:   File "\\config.int.mtc.fr\config\_MTC_C~1\_CONFI~1\HOUDIN~1.229\python27\lib\socket.py", line 557, in create_connection
2019-04-18 16:46:31:  0: STDOUT:     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
2019-04-18 16:46:31:  0: STDOUT: socket.gaierror: [Errno 11001] getaddrinfo failed
2019-04-18 16:46:31:  0: INFO: Process exit code: 1
Edited by Wolrajh - April 18, 2019 11:07:33
User Avatar
Member
571 posts
Joined: May 2017
Offline
Looks like the callback server isn't being addressed properly. There was a recent fix that changed from using IP address to hostname for the callback server, but that inadvertently broke the callback server when you are using DHCP. We will fix this soon. But in the mean time, a work around would be to use static IP. Please try that if you are able to (at least to check if that is indeed the issue, and as a short term solution).
User Avatar
Member
123 posts
Joined: May 2015
Offline
Ah yes, I see. I can't go static IP, especially these days with a massive revamp of our network, but modifying the ‘s’ argument directly in the CommandLine Settings in the Deadline job made it work. Just replaced my hostname with my actual IP on the network ( kept the port ) and it went smoothly.

Obviously it can't be final, but at least it worked !
  • Quick Links