VERSION 12.0.871 |
|
8 Jobs - In DetailThis section describes the internal structure and properties of an HQueue job. You can skip this section if you are just interested in how to submit jobs from Houdini and how to monitor them. This part of the documentation is targeted for people who plan to create their own jobs and submit them from outside of Houdini (i.e. from a Python script). However, knowing a little about the HQueue job design does not hurt as it would help clarify some of the terminology used throughout the web interface. 8.1 A Simple ExampleIn its most basic form, an HQueue job is merely a set of shell commands. When it is assigned to a client machine by the scheduler, the set of commands is executed. If the commands pass, then the job is said to have succeeded, otherwise, the job is said to have failed. For example, suppose we want to create a job which simply prints "Hello World!" to the console. In that case, the command set would just be:
echo "Hello World!"
When it is submitted to HQueue, the job passes through several states before it completes. Here is a depiction of the status changes for our Hello World example:
Our job is initially set with the waiting for machine status. At this time, the scheduler searches for an available client machine on the farm. When one is found, the scheduler assigns it to our job and the machine is notified about the assignment. The client responds to the notification and picks up the job from the HQueue server. The job's status is changed to running and its command set is executed on the client. Once execution is complete, the client contacts the HQueue server and the job is finally set to succeeded. Simple enough? 8.2 The Job SpecificationEvery HQueue job is defined by a specification -- a simple structure containing the job properties. Specifically, the specification is a Python dictionary where the keys are the property names and the values are the property values. For our Hello World example, the job specification would look like the following:
{
"name": "Print Hello World", "shell": "bash", "command": "echo 'Hello World!'" } In this example, we can see that there are 3 job properties -- name, shell and command. The properties indicate that the job is titled "Print Hello World", that the terminal shell to use when executing the commands is "bash" (as oppose to csh, or tcsh, etc.), and that a single command is to be executed on the client machine, specifically, "echo Hello World!". For a complete list of the job properties that can be included in the job specification, see 8.4 Job Properties . Submitting the Job SpecificationNow that our Hello World job is defined, we can use the specification to submit the job to HQueue. To do this, we can write a simple Python script:
import xmlrpclib
# Connect to the HQueue server. hq_server = xmlrpclib.ServerProxy("http://hq_server_hostname:5000") # Define a job which prints "Hello World!". job_spec = { "name": "Print Hello World", "shell": "bash", "command": "echo 'Hello World!'" } # Submit the job to the server. # newjob() returns a list of job ids (in case multiple jobs are passed in at once). job_ids = hq_server.newjob(job_spec) In the above script, we first make a connection to the HQueue server using the xmlrpclib Python module. Once a connection is established, we store a reference to it in the hq_server variable. Next we define our Hello World job and assign the specification to the job_spec variable. Finally, we submit the job to HQueue by calling the newjob() method from the HQueue server's API (see 12 Python API for a complete list of functions). Note that we pass the specification as a parameter to newjob(). Once the job is submitted, HQueue generates a unique id for it and returns the id to the caller. In the example above, we store the id into job_ids. Commandless JobsIt should be noted that the only required property in the job specification is name. Not even the command property is required. Jobs with no commands can still run on HQueue, but when assigned to a client machine, they do not perform any work. Sometimes commandless jobs can be useful. For example, you may want to test calling newjob() on the HQueue server without burdening the farm with any real tasks. Or you may want to create a commandless "container" job with dependencies on other jobs so that the other jobs appear to be grouped together when viewed on the web interface. Job dependencies are explained in detail in the next section. 8.3 Parent-Child RelationshipsSometimes a job, say Job A, cannot execute until another job, say Job B, has completed. In other words, Job A has a dependency on Job B. In HQueue terminology, when such a dependency exists, Job A is said to be the parent of Job B and Job B is said to be the child of Job A. A Simple Parent-Child ExampleSuppose we want to create an AVI video from a few frames, say 3, in a scene. Assuming that we have already generated IFD files for these frames, this task can be done in 2 steps:
For the first step, we define an HQueue job for each frame that would take the IFD and generate the image using Mantra. The job for rendering frame 1 would look like:
{
"name": "Render Frame 1", "shell": "bash", "command": "cd $HQROOT/houdini_distros/hfs; source houdini_setup; mantra < $HQROOT/path/to/ifds/frame0001.ifd" } The job specifications for rendering the second and third frames would be similar. Let us suppose that Mantra generates the images to a path on the shared network drive, say $HQROOT/path/to/output/frame*.png. Then for the encoding step, we can pass in these images as input to our utility encoder. So the HQueue job for the encoding step would look like:
{
"name": "Encode Video", "shell": "bash", "command": "someEncoder --input=$HQROOT/path/to/output/frame*.png --output=$HQROOT/path/to/output/myVideo.avi" } Now we can create a dependency from the encoding job to the 3 render jobs so that the encoding does not start until the rendering is complete. Once the render jobs are finished, then the encoding job is assigned to a client machine and is executed.
To create the dependency, we can use the children property. The children property accepts a list of child job specifications. So the final specification for our encoding job would look like:
{
"name": "Encode Video", "shell": "bash", "command": "someEncoder --input=$HQROOT/path/to/output/frame*.png --output=$HQROOT/path/to/output/myVideo.avi", "children": [ { "name": "Render Frame 1", "shell": "bash", "command": "cd $HQROOT/houdini_distros/hfs; source houdini_setup; mantra < $HQROOT/path/to/ifds/frame0001.ifd" }, { "name": "Render Frame 2", "shell": "bash", "command": "cd $HQROOT/houdini_distros/hfs; source houdini_setup; mantra < $HQROOT/path/to/ifds/frame0002.ifd" }, { "name": "Render Frame 3", "shell": "bash", "command": "cd $HQROOT/houdini_distros/hfs; source houdini_setup; mantra < $HQROOT/path/to/ifds/frame0003.ifd" }, ] } Status ChangesWith child dependencies, the parent job moves through a slightly different flow of status changes:
The parent job is initially has the waiting for children status. At this time, the scheduler assigns client machines to the child jobs and the child jobs begin are executed. When the children finish, if at least one of the children have failed, the parent job's status is set to failed and its command is not run. Otherwise, the parent job's status changes to waiting for machine. The scheduler locates an available client machine and assigns it to the parent job. Once the client picks up the job from the HQueue server, the parent's status changes to running. Finally, when the parent completes, it changes to succeeded. Submitting Child Jobs from Within The ParentIt is possible to submit new child jobs from within a running job. You may want to do this if for example you have several tasks that need processing but you do not know how many of those tasks you need until runtime. In that case, you can add commands to your parent job that calculate how many children it requires and submit specifications for the children to the HQueue server. To submit new child jobs, you can use the newjob() API function. To assign the new job as a child, you can pass the current job's id as a parameter. In a running job, the id is stored in the $JOBID environment variable. Below is an example of a Python script which creates a new job and assigns it as a child to the currently running job:
import os
import xmlrpclib # Connect to the HQueue server. hq_server = xmlrpclib.ServerProxy("http://hq_server_hostname:5000") # Define the child job. child_job_spec = { "name": "The Child Job", "shell": "bash", "command": "echo 'Hello World!'" } # Get the id of the current job. # It should be defined in the environment. current_job_id = os.environ["JOBID"] # Submit the job to the server. # newjob() returns a list of job ids (in case multiple jobs are passed in at once). job_ids = hq_server.newjob(job_spec, parentId=current_job_id) Now if we save this script into a file, say createChild.py, we can add it to the command property of the parent job. So the parent job's specification would look like:
{
"name": "The Parent Job", "shell": "bash", "command": "python $HQROOT/path/to/scripts/createChild.py" } Note that the status changes for a job that submits its own children is slightly different:
The parent job initially has no children and so its status is set to waiting for machine. When a client machine is assigned and executes it, the job is changed to running. During execution, the job submits new children to the HQueue server. While the parent continues to run, its children are assigned to client machines and they run as well. When the parent finishes, it needs to wait for its children before it can declare that it succeeded and so its status is changed to waiting for children. When all the children finish, then the parent job is finally set to succeeded. 8.4 Job PropertiesBelow is a list of the job properties that can exist in a job specification.
You may also add arbitrary properities to a job spec. These have no special meaning to the HQueue server but they might be useful to specific or custom jobs. The recommended way to do this is to use a seperate entry for the class of job that has all the custom properties it needs in a dictionary. For example, the HQueue Render ROP submits jobs that have an HQPARM property which is used by the scripts that are invoked by these job submissions. Job Properties ExampleBelow is an example of a job specification which demonstrates the use of some of the properties:
{
"name": "The Main Job", "shell": "bash", "environment": { "SHOW_MSG": "1", "MSG": "Hello World!" }, "command": "if [ $SHOW_MSG = 1 ]; then echo $MSG; fi", "tags": [ "single" ], "maxHosts": 1, "minHosts": 1, "priority": 0, "children": [ { "name": "The Child Job", "shell": "bash", "command": "echo 'Hello World!'" } ] } The example above defines a job named "The Main Job" which has a priority level of 0. It uses the 'bash' shell to execute its command set and it defines two variables, "SHOW_MSG" and "MSG", in the environment. Its command set directly references these two variables. The job requires one dedicated machine as defined by the "single" tag, and the "maxHosts" and "minHosts" properties. Finally, it has a single child job which prints out "Hello World". 8.5 Job ConditionsYou can attach conditions on a job which inform the HQueue scheduler to assign the job to a restricted set of client machines. A job condition is defined by a type, name, operator and value. Together they form a comparison test that the scheduler can use to determine whether a machine is acceptable to run the job. If a client machine passes ALL of the assigned conditions, then it can run the job. Below is a description of each of the condition components:
Job Condition ExamplesBelow is an example which demonstrates how to attach a condition to a job specification:
{
"name": "A Job with Conditions", "shell": "bash", "command": "echo 'I should be running on either machine1 or machine2!'", "conditions": [ { "type" : "client", "name":"hostname", "op":"any", "value":"machine1,machine2" }, ] } The example above defines a job which can only be assigned to a client machine named either "machine1" or "machine2". Note that the conditions property is a list of Python-like dictionaries where each dictionary defines a single condition and its 4 components. The next example shows how to set a condition where the job can only be assigned to client machines that are members of the "Simulation" group:
{
"name": "A Job for the Simulation Group", "shell": "bash", "command": "echo 'I should be running on a machine that is a member of the Simulation group!'", "conditions": [ { "type" : "client", "name":"group", "op":"==", "value":"Simulation" }, ] } The example above defines a job which can only be assigned to a client machine named either "machine1" or "machine2". Note that the conditions property is a list of Python-like dictionaries where each dictionary defines a single condition and its 4 components. 8.6 Job TagsTags can be used to describe whether the job requires a dedicated machine or whether it can share a machine with other running jobs. If no tags are specified, then by default, the job is configured to share the machine it is running on as long as the machine has enough CPUs. To declare that your job needs a dedicated machine, add the "single" tag to the tags property. This guarantees that no other jobs will be assigned to the machine while your job is running. You can also create custom single tags to control which sets of running jobs can share machines and which sets cannot. The sharing rule works as follows -- the scheduler will never assign running jobs to the same machine if they have matching tags. To create a custom single tag, simply prefix the tag name with "single". For example, suppose we create a custom single tag named "single:1" and assign it to two jobs, say A and B. And suppose we create another custom single tag named "single:2" and assign to two other jobs, say C and D. This entails that jobs A and B cannot execute on the same machine concurrently and the same goes for jobs C and D. However, job A or job B can run concurrently on the same machine that is running job C or job D and vice versa. 8.7 Job StatusesBelow is a list of all the job statuses and their descriptions.
8.8 Job VariablesBelow is a list of the built-in variables available in the command environment of a running job.
|