Usage

The jobrunner package is a Python API providing an abstraction layer to run jobs on HPC clusters using Grid Engine, SLURM, Torque, or on the local computer.

When using jobrunner, you do not need to execute qsub explicitly, the jobrunner package automatically builds and executes the correct qsub command for you.

To submit a job for execution on Grid Engine:

from jobrunner import JobRunner
runner = JobRunner("grid")  # This also works with "slurm" or "torque"
command_line = "echo Hello World"
job_id = runner.run(command_line, "JobName", "logfile.log")

To run a job locally on your computer, just change “grid” to “local”. Everything else stays the same:

from jobrunner import JobRunner
runner = JobRunner("local")
command_line = "echo Hello World"
job_id = runner.run(command_line, "JobName", "logfile.log")

For an array-job, first create a file with array job parameters, one line per task:

$ echo A B C > arrayfile
$ echo 1 2 3 >> arrayfile

To submit an array-job for execution on Grid Engine:

from jobrunner import JobRunner
runner = JobRunner("grid")  # This also works locally, just change "grid" to "local"
command_line = "echo Hello {1}{2}{3}"
job_id = runner.run_array(command_line, "JobName", "logfile.log", "arrayfile")

Array job parameter substitution

As you can see from the examples above, jobrunner has a very simple language for extracting parameters from a file and substituting the parameters into a command line.

Parameters in the array job parameter file are whitespace-separated. The parameters can have any meaning you want – numbers, strings, file names, directory names, etc.

The substitution language is just a number inside curly braces.

{1} is the first parameter found in the array job parameter file.

{2} is the second parameter found in the array job parameter file.

{3} is the third parameter found in the array job parameter file.

{9} is the 9th parameter found in the array job parameter file.

Currently, array jobs running locally have a limit of 9 parameters. Array jobs running on the HPC have no limit to the number of parameters per line in the array job parameter file.

jobrunner.jobrunner module

This module provides an abstraction layer to run jobs on high performance computers using torque, grid, or locally with xargs.

class jobrunner.jobrunner.JobRunner(hpc_type, strip_job_array_suffix=True, qsub_extra_params=None, exception_handler=None, verbose=False)[source]

Bases: object

__init__(hpc_type, strip_job_array_suffix=True, qsub_extra_params=None, exception_handler=None, verbose=False)[source]

Initialize an hpc job runner object.

Parameters:
  • hpc_type (str) – Type of job runner. Possible values are “grid”, “slurm”, “torque”, and “local”.
  • strip_job_array_suffix (bool, optional defaults to True) – When true, the dot and array suffix in the job id is removed before returning the job id.
  • qsub_extra_params (str, optional defaults to None) – Extra command line options passed to qsub or sbatch every time a job is submitted.
  • exception_handler (function, optional defalts to None) – Function to be called in local mode only when an exception occurs while attempting to run an external process. The function will be called with the arguments (exc_type, exc_value, exc_traceback).
  • verbose (bool, optional defaults to False) – When true, the job command lines are logged.

Examples

>>> runner = JobRunner("foobar")
Traceback (most recent call last):
ValueError: hpc_type must be one of: "grid", "slurm", "torque", "local"
run(command_line, job_name, log_file, wait_for=[], wait_for_array=[], threads=1, parallel_environment=None, exclusive=False, wall_clock_limit=None, quiet=False)[source]

Run a non-array job. Stderr is redirected (joined) to stdout.

Parameters:
  • command_line (str) – Command with all arguments to be executed.
  • job_name (str) – Job name that will appear in the job scheduler queue.
  • log_file (str) – Path to the combined stdout / stderr log file.
  • wait_for (str or list of str, optional defaults to empty list) – Single job id or list of jobs ids to wait for before beginning execution. Ignored when running locally.
  • wait_for_array (str or list of str, optional defaults to empty list) – Single array job id or list of array jobs ids to wait for before beginning execution. Ignored when running locally.
  • threads (int, optional defaults to 1) – Number of CPU threads consumed by the job, unused when running locally.
  • parallel_environment (str, optional defaults to None) – Name of the grid engine parallel execution environment. This must be specified when consuming more than one thread on grid engine. Ununsed for any other job scheduler.
  • exclusive (bool, optional, defaults to False) – Requests exclusive access to compute nodes to prevent other jobs from sharing the node resources. Enforced only on SLURM, silently ignored for all other schedulers.
  • wall_clock_limit (str, optional, defaults to None) – Maximum run-time; string of the form HH:MM:SS. Ignored when running locally.
  • quiet (bool, optional, defaults to False) – Controls whether the job stderr and stdout are written to stdout in addition to the log file. By default, the job stderr and stdout are written to both stdout and the log file. When True, the job stderr and stdout are written to the log file only.
Returns:

job_id – Grid or torque job id. Returns ‘0’ in local mode.

Return type:

str

Raises:
  • CalledProcessError
  • In local mode, non-zero exit codes will raise CalledProcessError and the exception will be routed to the exception handler installed during JobRunner initialization, if any. If no exception handler was specified, the exception is re-raised.

Examples

>>> # Normal case - verify job id is '0', stdout and stderr written to log file
>>> from tempfile import NamedTemporaryFile
>>> fout = NamedTemporaryFile(delete=False, mode='w'); fout.close()
>>> runner = JobRunner("local")
>>> # Parenthesis are needed when the command line contains multiple commands separated by semicolon
>>> job_id = runner.run("(echo text to stdout; echo text to stderr 1>&2)", "JobName", fout.name)
>>> type(job_id) == type("this is a string")
True
>>> job_id
'0'
>>> f = open(fout.name); out = f.read(); f.close(); os.unlink(fout.name)
>>> print(out.strip())
text to stdout
text to stderr
>>> # Error case, external program returns non-zero.
>>> # Need to ignore exception details to work with both python2 and python3.
>>> job_id = runner.run("exit 100", "JobName", "") # doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
CalledProcessError: Command 'set -o pipefail; exit 100 2>&1 | tee ' returned non-zero exit status 100
run_array(command_line, job_name, log_file, array_file, num_tasks=None, max_processes=None, wait_for=[], wait_for_array=[], slot_dependency=False, threads=1, parallel_environment=None, array_subshell=True, exclusive=False, wall_clock_limit=None, quiet=False)[source]

Run an array of sub-tasks with the work of each task defined by a single line in the specified array_file.

Parameters:
  • command_line (str) – Command to be executed with parameter placeholders of the form {1}, {2}, {3} …
  • job_name (str) – Job name that will appear in the job scheduler queue.
  • log_file (str) – Path to the combined stdout / stderr log file. The sub-task number will be automatically appended.
  • array_file (str) – Name of the file containing the arguments for each sub-task with one line per sub-task. The arguments for each sub-task are found at the line number corresponding to the sub-task number. The line is parsed and substituted into the command, replacing the parameter placeholders with the actual arguments.
  • num_tasks (int, optional defaults to None) – Defines the number of subtasks in the job array. If not specified, the array_file must exist and the number of tasks will be equal to the number of lines in the file. Use this option when the array_file does not pre-exist and is created by a process that has not run yet.
  • max_processes (int, optional defaults to None) – If None, the number of concurrent processes is limited to available CPU on an HPC and limited to the number of CPU cores when run locally. If not None, it sets the maximium number of concurrent processes for the array job. This works locally with xargs, and with grid and torque.
  • wait_for (str or list of str, optional defaults to empty list) – Single job id or list of jobs ids to wait for before beginning execution. Ignored when running locally.
  • wait_for_array (str or list of str, optional defaults to empty list) – Single array job id or list of array jobs ids to wait for before beginning execution. Ignored when running locally.
  • slot_dependency (bool, optional defaults to False) – Ignored for all schedulers but grid engine. If true, the sub-tasks of the array job being submitted will be dependent on the completion of the corresponding sub-tasks of the jobs in the wait_for_array. Has no effect on the dependencies of non-array jobs.
  • threads (int, optional defaults to 1) – Number of CPU threads consumed by each sub-task of the job, unused when running locally.
  • parallel_environment (str, optional defaults to None) – Name of the grid engine parallel execution environment. Ununsed for any other job scheduler.
  • array_subshell (bool, optional defaults to True) – When true, HPC array job command lines are quoted and executed in a subshell. When running locally, this parameter is ignored – commands are not quoted and always run in a subshell.
  • exclusive (bool, optional, defaults to False) – Requests exclusive access to compute nodes to prevent other jobs from sharing the node resources. Enforced only on SLURM, silently ignored for all other schedulers.
  • wall_clock_limit (str, optional, defaults to None) – Maximum run-time; string of the form HH:MM:SS. Ignored when running locally.
  • quiet (bool, optional, defaults to False) – Controls whether the job stderr and stdout are written to stdout in addition to the log file. By default, the job stderr and stdout are written to both stdout and the log file. When True, the job stderr and stdout are written to the log file only.
Returns:

job_id – Grid or torque job id. Returns ‘0’ in local mode.

Return type:

str

Raises:
  • JobRunnerException
  • If the array_file is missing or empty, and num_tasks is not specified, JobRunnerException is raised.
  • In local mode, non-zero exit codes will raise CalledProcessError and the exception will be routed to the exception handler installed during JobRunner initialization, if any. If no exception handler was specified, the exception is re-raised.
exception jobrunner.jobrunner.JobRunnerException[source]

Bases: exceptions.Exception

Raised for fatal JobRunner errors