jobrunner

Contents:

jobrunner

https://img.shields.io/pypi/v/jobrunner.svg

An abstraction layer to run jobs on HPC clusters using Grid Engine, SLURM, Torque, or locally.

The jobrunner package was developed by the United States Food and Drug Administration, Center for Food Safety and Applied Nutrition.

Features

  • Python API for job submission
  • Consistent interface to run jobs on Grid Engine, SLURM, Torque, or locally
  • Dependencies between jobs
  • Array jobs and normal non-array jobs
  • Array job parameter substitution
  • Array job slot-dependency
  • Limit the CPU resources consumed by array jobs
  • Separate log files for each array job task

Citing jobrunner

To cite jobrunner, please reference the jobrunner GitHub repository:

License

See the LICENSE file included in the jobrunner distribution.

Installation

At the command line:

$ pip install --user jobrunner

Or, if you have virtualenvwrapper installed:

$ mkvirtualenv jobrunner
$ pip install jobrunner

Upgrading jobrunner

If you previously installed with pip, you can upgrade to the newest version from the command line:

$ pip install --user --upgrade jobrunner

Uninstalling jobrunner

If you installed with pip, you can uninstall from the command line:

$ pip uninstall jobrunner

Usage

The jobrunner package is a Python API providing an abstraction layer to run jobs on HPC clusters using Grid Engine, SLURM, Torque, or on the local computer.

When using jobrunner, you do not need to execute qsub explicitly, the jobrunner package automatically builds and executes the correct qsub command for you.

To submit a job for execution on Grid Engine:

from jobrunner import JobRunner
runner = JobRunner("grid")  # This also works with "slurm" or "torque"
command_line = "echo Hello World"
job_id = runner.run(command_line, "JobName", "logfile.log")

To run a job locally on your computer, just change “grid” to “local”. Everything else stays the same:

from jobrunner import JobRunner
runner = JobRunner("local")
command_line = "echo Hello World"
job_id = runner.run(command_line, "JobName", "logfile.log")

For an array-job, first create a file with array job parameters, one line per task:

$ echo A B C > arrayfile
$ echo 1 2 3 >> arrayfile

To submit an array-job for execution on Grid Engine:

from jobrunner import JobRunner
runner = JobRunner("grid")  # This also works locally, just change "grid" to "local"
command_line = "echo Hello {1}{2}{3}"
job_id = runner.run_array(command_line, "JobName", "logfile.log", "arrayfile")

Array job parameter substitution

As you can see from the examples above, jobrunner has a very simple language for extracting parameters from a file and substituting the parameters into a command line.

Parameters in the array job parameter file are whitespace-separated. The parameters can have any meaning you want – numbers, strings, file names, directory names, etc.

The substitution language is just a number inside curly braces.

{1} is the first parameter found in the array job parameter file.

{2} is the second parameter found in the array job parameter file.

{3} is the third parameter found in the array job parameter file.

{9} is the 9th parameter found in the array job parameter file.

Currently, array jobs running locally have a limit of 9 parameters. Array jobs running on the HPC have no limit to the number of parameters per line in the array job parameter file.

jobrunner.jobrunner module

This module provides an abstraction layer to run jobs on high performance computers using torque, grid, or locally with xargs.

class jobrunner.jobrunner.JobRunner(hpc_type, strip_job_array_suffix=True, qsub_extra_params=None, exception_handler=None, verbose=False)[source]

Bases: object

__init__(hpc_type, strip_job_array_suffix=True, qsub_extra_params=None, exception_handler=None, verbose=False)[source]

Initialize an hpc job runner object.

Parameters:
  • hpc_type (str) – Type of job runner. Possible values are “grid”, “slurm”, “torque”, and “local”.
  • strip_job_array_suffix (bool, optional defaults to True) – When true, the dot and array suffix in the job id is removed before returning the job id.
  • qsub_extra_params (str, optional defaults to None) – Extra command line options passed to qsub or sbatch every time a job is submitted.
  • exception_handler (function, optional defalts to None) – Function to be called in local mode only when an exception occurs while attempting to run an external process. The function will be called with the arguments (exc_type, exc_value, exc_traceback).
  • verbose (bool, optional defaults to False) – When true, the job command lines are logged.

Examples

>>> runner = JobRunner("foobar")
Traceback (most recent call last):
ValueError: hpc_type must be one of: "grid", "slurm", "torque", "local"
run(command_line, job_name, log_file, wait_for=[], wait_for_array=[], threads=1, parallel_environment=None, exclusive=False, wall_clock_limit=None, quiet=False)[source]

Run a non-array job. Stderr is redirected (joined) to stdout.

Parameters:
  • command_line (str) – Command with all arguments to be executed.
  • job_name (str) – Job name that will appear in the job scheduler queue.
  • log_file (str) – Path to the combined stdout / stderr log file.
  • wait_for (str or list of str, optional defaults to empty list) – Single job id or list of jobs ids to wait for before beginning execution. Ignored when running locally.
  • wait_for_array (str or list of str, optional defaults to empty list) – Single array job id or list of array jobs ids to wait for before beginning execution. Ignored when running locally.
  • threads (int, optional defaults to 1) – Number of CPU threads consumed by the job, unused when running locally.
  • parallel_environment (str, optional defaults to None) – Name of the grid engine parallel execution environment. This must be specified when consuming more than one thread on grid engine. Ununsed for any other job scheduler.
  • exclusive (bool, optional, defaults to False) – Requests exclusive access to compute nodes to prevent other jobs from sharing the node resources. Enforced only on SLURM, silently ignored for all other schedulers.
  • wall_clock_limit (str, optional, defaults to None) – Maximum run-time; string of the form HH:MM:SS. Ignored when running locally.
  • quiet (bool, optional, defaults to False) – Controls whether the job stderr and stdout are written to stdout in addition to the log file. By default, the job stderr and stdout are written to both stdout and the log file. When True, the job stderr and stdout are written to the log file only.
Returns:

job_id – Grid or torque job id. Returns ‘0’ in local mode.

Return type:

str

Raises:
  • CalledProcessError
  • In local mode, non-zero exit codes will raise CalledProcessError and the exception will be routed to the exception handler installed during JobRunner initialization, if any. If no exception handler was specified, the exception is re-raised.

Examples

>>> # Normal case - verify job id is '0', stdout and stderr written to log file
>>> from tempfile import NamedTemporaryFile
>>> fout = NamedTemporaryFile(delete=False, mode='w'); fout.close()
>>> runner = JobRunner("local")
>>> # Parenthesis are needed when the command line contains multiple commands separated by semicolon
>>> job_id = runner.run("(echo text to stdout; echo text to stderr 1>&2)", "JobName", fout.name)
>>> type(job_id) == type("this is a string")
True
>>> job_id
'0'
>>> f = open(fout.name); out = f.read(); f.close(); os.unlink(fout.name)
>>> print(out.strip())
text to stdout
text to stderr
>>> # Error case, external program returns non-zero.
>>> # Need to ignore exception details to work with both python2 and python3.
>>> job_id = runner.run("exit 100", "JobName", "") # doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
CalledProcessError: Command 'set -o pipefail; exit 100 2>&1 | tee ' returned non-zero exit status 100
run_array(command_line, job_name, log_file, array_file, num_tasks=None, max_processes=None, wait_for=[], wait_for_array=[], slot_dependency=False, threads=1, parallel_environment=None, array_subshell=True, exclusive=False, wall_clock_limit=None, quiet=False)[source]

Run an array of sub-tasks with the work of each task defined by a single line in the specified array_file.

Parameters:
  • command_line (str) – Command to be executed with parameter placeholders of the form {1}, {2}, {3} …
  • job_name (str) – Job name that will appear in the job scheduler queue.
  • log_file (str) – Path to the combined stdout / stderr log file. The sub-task number will be automatically appended.
  • array_file (str) – Name of the file containing the arguments for each sub-task with one line per sub-task. The arguments for each sub-task are found at the line number corresponding to the sub-task number. The line is parsed and substituted into the command, replacing the parameter placeholders with the actual arguments.
  • num_tasks (int, optional defaults to None) – Defines the number of subtasks in the job array. If not specified, the array_file must exist and the number of tasks will be equal to the number of lines in the file. Use this option when the array_file does not pre-exist and is created by a process that has not run yet.
  • max_processes (int, optional defaults to None) – If None, the number of concurrent processes is limited to available CPU on an HPC and limited to the number of CPU cores when run locally. If not None, it sets the maximium number of concurrent processes for the array job. This works locally with xargs, and with grid and torque.
  • wait_for (str or list of str, optional defaults to empty list) – Single job id or list of jobs ids to wait for before beginning execution. Ignored when running locally.
  • wait_for_array (str or list of str, optional defaults to empty list) – Single array job id or list of array jobs ids to wait for before beginning execution. Ignored when running locally.
  • slot_dependency (bool, optional defaults to False) – Ignored for all schedulers but grid engine. If true, the sub-tasks of the array job being submitted will be dependent on the completion of the corresponding sub-tasks of the jobs in the wait_for_array. Has no effect on the dependencies of non-array jobs.
  • threads (int, optional defaults to 1) – Number of CPU threads consumed by each sub-task of the job, unused when running locally.
  • parallel_environment (str, optional defaults to None) – Name of the grid engine parallel execution environment. Ununsed for any other job scheduler.
  • array_subshell (bool, optional defaults to True) – When true, HPC array job command lines are quoted and executed in a subshell. When running locally, this parameter is ignored – commands are not quoted and always run in a subshell.
  • exclusive (bool, optional, defaults to False) – Requests exclusive access to compute nodes to prevent other jobs from sharing the node resources. Enforced only on SLURM, silently ignored for all other schedulers.
  • wall_clock_limit (str, optional, defaults to None) – Maximum run-time; string of the form HH:MM:SS. Ignored when running locally.
  • quiet (bool, optional, defaults to False) – Controls whether the job stderr and stdout are written to stdout in addition to the log file. By default, the job stderr and stdout are written to both stdout and the log file. When True, the job stderr and stdout are written to the log file only.
Returns:

job_id – Grid or torque job id. Returns ‘0’ in local mode.

Return type:

str

Raises:
  • JobRunnerException
  • If the array_file is missing or empty, and num_tasks is not specified, JobRunnerException is raised.
  • In local mode, non-zero exit codes will raise CalledProcessError and the exception will be routed to the exception handler installed during JobRunner initialization, if any. If no exception handler was specified, the exception is re-raised.
exception jobrunner.jobrunner.JobRunnerException[source]

Bases: exceptions.Exception

Raised for fatal JobRunner errors

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions

Report Bugs

Report bugs at https://github.com/CFSAN-Biostatistics/jobrunner/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.
  • Any details about your local setup that might be helpful in troubleshooting.
  • Detailed steps to reproduce the bug.

Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implement it.

Write Documentation

jobrunner could always use more documentation, whether as part of the official jobrunner docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/CFSAN-Biostatistics/jobrunner/issues.

If you are proposing a feature:

  • Explain in detail how it would work.
  • Keep the scope as narrow as possible, to make it easier to implement.
  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up jobrunner for local development.

  1. Fork the jobrunner repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/jobrunner.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv jobrunner
    $ cd jobrunner/
    $ pip install sphinx_rtd_theme    # the documentation uses the ReadTheDocs theme
    $ pip install pytest
    $ python setup.py develop
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:

    $ flake8 jobrunner tests
    $ pytest -v
    $ tox
    

    To get flake8 and tox, just pip install them into your virtualenv.

  6. Update the documentation and review the changes locally with sphinx:

    $ make docs
    
  7. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    
  8. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests.
  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
  3. The pull request should work for Python 2.7, 3.4, 3.5, 3.6, and for PyPy.

Tips

To run a subset of tests:

$ pytest -v tests/test_jobrunner.py

Credits

Development Lead

  • Steve Davis

CFSAN Bioinformatics Team

  • Steve Davis

External Contributors

None yet. Why not be the first?

History

1.4.0 (2020-08-21)

  • Add support for wall clock time limits.

1.3.1 (2020-08-12)

  • Allow array tasks in local mode to process only a portion of the lines in the array file by setting num_tasks to a value less than the number of lines in the array file.

1.3.0 (2020-04-12)

  • Add support for the SLURM job scheduler.
  • Add capability to request exclusive access to compute nodes when running on SLURM.

1.2.0 (2019-10-11)

  • Add the capability to run in quiet mode when running locally on a workstation so the job stdout and stderr are written to log files only.

1.1.0 (2019-06-07)

  • HPC array job command lines are quoted and executed in a subshell by default with better support for complex command lines.

1.0.0 (2018-12-03)

  • First public release.

Indices and tables