Slurm Job Arrays
SLURM Job Array¶
Job arrays offer a mechanism for submitting and managing collections of similar but independent jobs quickly and easily. Job arrays are only supported for batch jobs and the array index values are specified using the –array or -a option of the #SBATCH directive in a SLURM job command file.
#SBATCH --array=s-e
s: Start index (Minimum is 0) e: End index (Maximum is 1000)
A job array can also be specified at the command line with
$ sbatch --array=s-e job.cmd
Examples¶
A job array will be created with a number of independent jobs corresponding to the defined array with task id, 1,2,3 … 20.
$ sbatch --array=1-20 job.cmd
A comma-separated list of task numbers rather a range can be provided.
$ sbatch --array=1,2,4,8 job.cmd
A job array with array tasks numbered 1, 3, 6 and 9.
#SBATCH --array=1-9:3
Naming output and error files¶
SLURM uses the %A and %a replacement strings for the master job id and task id, respectively.
Example:¶
#SBATCH --output=myjob.%A_%a.out
#SBATCH --error=myjob.%A_%a.err
Limiting the number of active job array tasks¶
To limit a job array by having only a certain number of tasks active at a time, %N suffix may be used where N is the number of active tasks. An example below will submit a 50 task job array with only 10 tasks active at a time.
#SBATCH -a 1-50%10
If you want to change the number of concurrent tasks of an active job after submission, you may run:
$ scontrol update ArrayTaskThrottle= JobId=
eg
$ scontrol update ArrayTaskThrottle=20 JobId=2021
Example job command file with job array
#!/bin/bash
#SBATCH --job-name=array_job
#SBATCH --output=array_job_%A_%a.out
#SBATCH --error=array_job_%A_%a.err
#SBATCH --array=1-6
#SBATCH --time=01:00:00
#SBATCH --partition=shortq
#SBATCH --ntasks=1
#SBATCH --mem=6G
echo "SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
ml gatk
gatk HaplotypeCaller -I sample1.bam -L $SLURM_ARRAY_TASK_ID ...
An environment variable, $SLURM_ARRAY_TASK_ID is assigned by SLURM to each array task, which can be referenced inside a job script to handle program parameters, input and output.
Alternatively, you may run the following if the you have a file “individual.txt” where each line contains an item to be processed.
individual_list=($(<individual.txt))
individual=${individual_list[${SLURM_ARRAY_TASK_ID}]}
gatk haplotypecaller -I ${individual} ...
Deleting job arrays and tasks¶
To delete all of the tasks of an array job, use scancel with the job ID:
$ scancel 2021
To delete a single task, add the task ID:
$ scancel 2021_7