Skip to content

Slurm Job Arrays

SLURM Job Array

Job arrays offer a mechanism for submitting and managing collections of similar but independent jobs quickly and easily. Job arrays are only supported for batch jobs and the array index values are specified using the –array or -a option of the #SBATCH directive in a SLURM job command file.

    #SBATCH --array=s-e

s: Start index (Minimum is 0) e: End index (Maximum is 1000)

A job array can also be specified at the command line with

    $ sbatch --array=s-e job.cmd

Examples

A job array will be created with a number of independent jobs corresponding to the defined array with task id, 1,2,3 … 20.

    $ sbatch --array=1-20 job.cmd

A comma-separated list of task numbers rather a range can be provided.

    $ sbatch --array=1,2,4,8 job.cmd

A job array with array tasks numbered 1, 3, 6 and 9.

    #SBATCH --array=1-9:3

Naming output and error files

SLURM uses the %A and %a replacement strings for the master job id and task id, respectively.

Example:

    #SBATCH --output=myjob.%A_%a.out
    #SBATCH --error=myjob.%A_%a.err

Limiting the number of active job array tasks

To limit a job array by having only a certain number of tasks active at a time, %N suffix may be used where N is the number of active tasks. An example below will submit a 50 task job array with only 10 tasks active at a time.

    #SBATCH -a 1-50%10

If you want to change the number of concurrent tasks of an active job after submission, you may run:

    $ scontrol update ArrayTaskThrottle= JobId=

eg

    $ scontrol update ArrayTaskThrottle=20 JobId=2021

Example job command file with job array

    #!/bin/bash
    #SBATCH --job-name=array_job
    #SBATCH --output=array_job_%A_%a.out
    #SBATCH --error=array_job_%A_%a.err
    #SBATCH --array=1-6
    #SBATCH --time=01:00:00
    #SBATCH --partition=shortq
    #SBATCH --ntasks=1
    #SBATCH --mem=6G
    echo "SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
    ml gatk
    gatk HaplotypeCaller -I sample1.bam -L $SLURM_ARRAY_TASK_ID ...

An environment variable, $SLURM_ARRAY_TASK_ID is assigned by SLURM to each array task, which can be referenced inside a job script to handle program parameters, input and output.

Alternatively, you may run the following if the you have a file “individual.txt” where each line contains an item to be processed.

    individual_list=($(<individual.txt))
    individual=${individual_list[${SLURM_ARRAY_TASK_ID}]}
    gatk haplotypecaller -I ${individual} ...

Deleting job arrays and tasks

    To delete all of the tasks of an array job, use scancel with the job ID:

    $ scancel 2021

    To delete a single task, add the task ID:

    $ scancel 2021_7