HQUEUE CLIENT WON'T simulate any of works.

   639   7   2
User Avatar
Member
4 posts
Joined: Feb. 2017
Offline
Hello Illusionist over the world!

I am struggling days and days to setting up my local HQUEUE system.

My condition is

1. I Have 2 PCs one is HQUEUE SERVER and HOUDINI INDIE had installed. the other one is only HQUEUE CLIENT.
2. HQUEUE SERVER has installed and works fine.
3. When I use hq_sim node to submit a simulation through HQUEUE, SERVER PC's simulation job works fine.
4. BUT, When I submit a job to HQUEUE CLIENT PC, simulation works fail however server-client communication
or upgrade order is working.
5. failure log is “setting status to failed”

Could somebody help me to get things right?

Thank you in advance.

I'll attach my HQUEUE website.

Attachments:
hq01.PNG (94.4 KB)
hq02.PNG (33.4 KB)
hq03.PNG (45.7 KB)

User Avatar
Staff
1024 posts
Joined: July 2005
Offline
Hello,

Can you post the output log and diagnostics files for the failed job?

Cheers,
Rob
User Avatar
Member
4 posts
Joined: Feb. 2017
Offline
rvinluan
Hello,

Can you post the output log and diagnostics files for the failed job?

Cheers,
Rob


Hello Rob, Thank you for your reply.
I'll attach my logs.

Job Diagnostic Information

Diagnostic Information for Job 104:
===================================
Job Name: Simulate -> HIP: 0208_scene.hiplc ROP: output (No Slices)
Submitted By: VOM
Job ID: 104
Parent Job ID(s): 103
Number of Clients Assigned: 1
Job Status: failed
Report Generated On: February 10, 2018 09:11:28 AM

Job Properties:
===============
Description: None
Tries Left: 0
Priority: 5
Minimum Number of Hosts: 1
Maximum Number of Hosts: 1
Tags: single
Queue Time: February 08, 2018 06:55:59 PM
Runnable Time: February 08, 2018 06:55:59 PM
Command Start Time: February 08, 2018 06:57:13 PM
Command End Time:
Start Time: February 08, 2018 06:57:13 PM
End Time: February 08, 2018 06:57:20 PM
Time to Complete: 7s
Time in Queue: 1m 13s

Job Environment Variables:
==========================
HQCOMMANDS:
{
"hythonCommandsLinux": "export HOUDINI_PYTHON_VERSION=2.7 && export HFS=\"C:/PROGRA~1/SIDEEF~1/HOUDIN~1.378\" && cd $HFS && source ./houdini_setup && hython -u",
"pythonCommandsMacosx": "export HFS=\"C:/PROGRA~1/SIDEEF~1/HOUDIN~1.378\" && $HFS/Frameworks/Python.framework/Versions/2.7/bin/python",
"pythonCommandsLinux": "export HFS=\"C:/PROGRA~1/SIDEEF~1/HOUDIN~1.378\" && $HFS/python/bin/python2.7",
"pythonCommandsWindows": "(set HFS=C:\\PROGRA~1\\SIDEEF~1\\HOUDIN~1.378) && echo Accessing \"!HFS!\\python27\\python2.7.exe\" ... && \"!HFS!\\python27\\python2.7.exe\"",
"mantraCommandsLinux": "export HFS=\"C:/PROGRA~1/SIDEEF~1/HOUDIN~1.378\" && cd $HFS && source ./houdini_setup && $HFS/python/bin/python2.7 $HFS/houdini/scripts/hqueue/hq_mantra.py",
"mantraCommandsMacosx": "export HFS=\"C:/PROGRA~1/SIDEEF~1/HOUDIN~1.378\" && cd $HFS && source ./houdini_setup && $HFS/Frameworks/Python.framework/Versions/2.7/bin/python $HFS/houdini/scripts/hqueue/hq_mantra.py",
"hythonCommandsMacosx": "export HOUDINI_PYTHON_VERSION=2.7 && export HFS=\"C:/PROGRA~1/SIDEEF~1/HOUDIN~1.378\" && cd $HFS && source ./houdini_setup && hython -u",
"hythonCommandsWindows": "(set HOUDINI_PYTHON_VERSION=2.7) && (set HFS=C:\\PROGRA~1\\SIDEEF~1\\HOUDIN~1.378) && (set PATH=C:\\PROGRA~1\\SIDEEF~1\\HOUDIN~1.378\\bin;!PATH!) && echo Accessing \"!HFS!\\bin\\hython\" ... && \"!HFS!\\bin\\hython\" -u",
"mantraCommandsWindows": "(set HFS=C:\\PROGRA~1\\SIDEEF~1\\HOUDIN~1.378) && echo Accessing \"!HFS!\\python27\\python2.7.exe\" ... && \"!HFS!\\python27\\python2.7.exe\" \"!HFS!\\houdini\\scripts\\hqueue\\hq_mantra.py\""
}

JOB:
///192.168.0.9/hqueue/projects

HQPARMS:
{
"dirs_to_create": [],
"hip_file": "H:/VOM/1802_exh/0208_scene.hiplc",
"output_driver": "/obj/AutoDopNetwork/output",
"dependency_order": "frame_by_frame",
"enable_perf_mon": 1
}

HQHOSTS:
RYZEN

HQ_PRESERVE_ENV_VARS:
JOB

Job Conditions and Requirements:
================================
hostname any RYZEN

Client Job Commands:
=============================
Windows Command:
(set HOUDINI_PYTHON_VERSION=2.7) && (set HFS=C:\PROGRA~1\SIDEEF~1\HOUDIN~1.378) && (set PATH=C:\PROGRA~1\SIDEEF~1\HOUDIN~1.378\bin;!PATH!) && echo Accessing "!HFS!\bin\hython" ... && "!HFS!\bin\hython" -u "!HFS!\houdini\scripts\hqueue\hq_run_sim_without_slices.py"

onSuccess Commands:
=============================
No commands were found.

onCancel Commands:
=============================
No commands were found.

onError Commands:
=============================
No commands were found.

onChildError Commands:
=============================
No commands were found.

Client Machine Specification (RYZEN):
=====================================
DNS Name: RYZEN
Client ID: 2
Operating System: windows
Architecture: x86_64
Number of CPUs: 16
CPU Speed: 3622.0
Memory: 67026964

Client Machine Configuration File Contents (RYZEN):
===================================================
[main]
server = 192.168.0.9
port = 5000
sharedNetwork.mount = \\192.168.0.9\hqueue
[job_environment]


HQueue Server Configuration File Contents:
==========================================
#
# hqserver - Pylons configuration
#
# The %(here)s variable will be replaced with the parent directory of this file
#
[DEFAULT]
email_to = you@yourdomain.com
smtp_server = localhost
error_email_from = paste@localhost

[server:main]
use = egg:Paste#http
host = 192.168.0.9
port = 5000

[app:main]

# The shared network.
hqserver.sharedNetwork.host = 192.168.0.9
hqserver.sharedNetwork.path.windows = hqueue
hqserver.sharedNetwork.mount.windows = J:
# hqserver.sharedNetwork.path.linux = %(here)s/shared
# hqserver.sharedNetwork.path.windows = \\T7910-VOM\hqueue
# hqserver.sharedNetwork.path.macosx = %(here)s/HQShared
# hqserver.sharedNetwork.mount.linux = /mnt/hqueue
# hqserver.sharedNetwork.mount.windows = H:
# hqserver.sharedNetwork.mount.macosx = /Volumes/HQShared

# Server port number.
hqserver.port = 5000

# Algorithm to use when assigning clients to jobs.
# Available options:
# sharecpu - Assign clients so that the number of CPUs is distributed evenly
# across the jobs.
# fifo - Assign clients to the job that was submitted first. When the job is
# finished then assign clients to the job that was submitted next.
hqserver.schedulingAlgorithm = sharecpu

# Where to save job output
job_logs_dir = %(here)s/job_logs

# Specify the database for SQLAlchemy to use
sqlalchemy.default.url = sqlite:///%(here)s/db/hqserver.db

# This is required if using mysql
sqlalchemy.default.pool_recycle = 3600

# This will force a thread to reuse connections.
sqlalchemy.default.strategy = threadlocal

#########################################################################
# Uncomment these configuration values if you are using a MySQL database.
#########################################################################
# The maximum number of database connections available in the
# connection pool. If you see "QueuePool limit of size" messages
# in the errors.log, then you should increase the value of pool_size.
# This is typically done for farms with a large number of client machines.
#sqlalchemy.default.pool_size = 30
#sqlalchemy.default.max_overflow = 20

# Where to publish myself in avahi
# hqnode will use this to connect
publish_url = http://hostname.domain.com:5000

# How many minutes before a client is considered inactive
hqserver.activeTimeout = 3

# How many days before jobs are deleted
hqserver.expireJobsDays = 10

# The maximum number of jobs (under the same root parent job) that can fail on
# a single client before a condition is dynamically added to that root parent
# job (and recursively all its children) that excludes the client from ever
# running this job/these jobs again. This value should be a postive integer
# greater than zero. To disable this feature, set this value to zero.
hqserver.maxFailsAllowed = 5

# The priority that the 'upgrade' job gets.
hqserver.upgradePriority = 100

use = egg:hqserver
full_stack = True
cache_dir = %(here)s/data
beaker.session.key = hqserver
beaker.session.secret = somesecret
app_instance_uuid = {a0dc502b-9764-4fa2-b85a-a573dae34adc}

# The maximum allowed body length (in bytes) of an XMLRPC message sent to the
# server.
hqserver.maxXMLRPCBodyLength = 4194304

# Logging Setup
[loggers]
keys = root

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
# Change to "level = DEBUG" to see debug messages in the log.
level = INFO
handlers = console

# This handler backs up the log when it reaches 10Mb
# and keeps at most 5 backup copies.
[handler_console]
class = handlers.RotatingFileHandler
args = ("hqserver.log", "a", 10485760, 5)
level = NOTSET
formatter = generic

[formatter_generic]
format = %(asctime)s %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %B %d, %Y %H:%M:%S

Job Status Log:
===============
February 08, 2018 06:55:59 PM: Assigned to RYZEN (master)
February 08, 2018 06:57:13 PM: setting status to running
February 08, 2018 06:57:20 PM: setting status to failed

Output Log


Could not find output file for job '104'.



Thank you!!!

Attachments:
unknown_job_output.txt (41 bytes)
job_104_diagnostic_information.txt (8.0 KB)

User Avatar
Staff
1024 posts
Joined: July 2005
Offline
Interesting that HQueue couldn't find an output file for the failed job. It's almost as if the client machine is unable to send progress information back to the server.

Could you also post these files from RYZEN?
  • C:/HQueueClient/hqnode.log
  • C:/HQueueClient/errors.log(if it exists)

Cheers,
Rob
User Avatar
Member
4 posts
Joined: Feb. 2017
Offline
rvinluan
Interesting that HQueue couldn't find an output file for the failed job. It's almost as if the client machine is unable to send progress information back to the server.

Could you also post these files from RYZEN?
  • C:/HQueueClient/hqnode.log
  • C:/HQueueClient/errors.log(if it exists)

Cheers,
Rob

Hello, Rob.

I write down the RYZEN's hqnode.log

[main]
server = 192.168.0.9
port = 5000
sharedNetwork.mount = \\192.168.0.9\hqueue
[job_environment]

and the “errors.log” is not exist.


Thank you for your reply.

Best regards,
BeomHee
User Avatar
Staff
1024 posts
Joined: July 2005
Offline
vomthedirector
I write down the RYZEN's hqnode.log

[main]
server = 192.168.0.9
port = 5000
sharedNetwork.mount = \\192.168.0.9\hqueue
[job_environment]

That looks like it is coming from the hqnode.inifile.

Can you post the contents of hqnode.loginstead?

Cheers,
Rob
User Avatar
Member
4 posts
Joined: Feb. 2017
Offline
rvinluan
vomthedirector
I write down the RYZEN's hqnode.log

[main]
server = 192.168.0.9
port = 5000
sharedNetwork.mount = \\192.168.0.9\hqueue
[job_environment]

That looks like it is coming from the hqnode.inifile.

Can you post the contents of hqnode.loginstead?

Cheers,
Rob


ah-ha

I misunderstood.
I'll attach the hqnode.log file because it's so many codes to write.

Thank you so much!

Best regards,
BeomHee.

Attachments:
hqnode.log (39.7 KB)

User Avatar
Staff
1024 posts
Joined: July 2005
Offline
Hmm. I can see consistent errors in hqnode.log that point to the client machine failing to connect to the server. The failure doesn't happen all the time just intermittently.

It's almost as if the network becomes temporarily unreachable.

I have a couple of questions:
- Is the client machine reporting a heartbeat? i.e. check the Last Heartbeat column in the clients table on the HQueue web interface
- Is the client machine consistently failing jobs? And is HQueue consistently unable to find the job output file?
- Is the HQueue Client service running on the client machine? Or are you manually starting the client by running C:/HQueueClient/hqnode.bat? And if the service is running, what logon account is used to run the service?

Cheers,
Rob
  • Quick Links