SLURM Job Management¶

After a job is submitted to SLURM, user may check the job status with commands sq or showq as described below. Show any running/pending jobs

    $ squeue
         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
         23598    shortq dphpcap_ p210006c  R 13-23:48:34      1 compute26
         23707    shortq dphpcap_ p210006c  R 11-15:37:20      1 compute27
         23708    shortq dphpcap_ p210006c  R 11-13:07:41      1 compute17
         23716    shortq  ts1-opt p170072c  R 11-08:32:55      1 compute06
         23752    shortq dphpcap_ p210006c  R 5-02:59:36      1 compute23
         23872    shortq Ge-m3-bs p180028c  R 7-16:37:59      1 compute12
         23949   mediumq nfkappa8 p140114n  R 5-07:31:59      3 compute[01-02,23]
         23953    shortq as-origi p200019c  R 6-02:59:31      1 compute05
         23970      gpuq    Sol-2 p180089p  R 3-15:38:49      1 compute32
         23976   mediumq   Au8-O2 p180089p  R 5-13:23:58      2 compute[05,25]
         24004    shortq      mix   raghuc  R 3-06:19:34      1 compute29
         24020      gpuq e8r_fexo tomskari  R 3-03:13:48      1 compute33
         24023      gpuq    58RPE p210139b  R 3-01:11:50      1 compute33
         24038    shortq     mix2   raghuc  R 2-21:38:58      1 compute28
         24039    shortq     mix1   raghuc  R 2-21:38:41      1 compute30
         24055   mediumq     mix3   raghuc  R 1-13:28:53      2 compute[01-02]
         24056    shortq  Cu8_o21 p180089p  R 1-13:28:53      1 compute03
         24079   mediumq      md2 p200028p  R 1-20:38:10      2 compute[26-27]
         24086   mediumq   Ni_ads p220106p  R 1-17:05:33      2 compute[17,24]
         24119    shortq     NEB1 p200028p  R   14:49:51      1 compute31
         24124   mediumq nfkappa8 p140114n  R   17:44:16      2 compute[19,25]
         24125     longq nfkappa8 p140114n  R   17:39:01      5 compute[03,05,11,19-20]
         24127    shortq   elabqa p170030c  R    5:43:32      1 compute20
         24130    shortq   zlabqa p170030c  R    5:39:36      1 compute24
         24131    shortq   zlabqa p170030c  R    5:39:11      1 compute11
         24133    shortq    bcc30 p190146m  R    4:50:52      1 compute12
         24146    shortq   NEB2_2 p200028p  R      35:49      1 compute09
         24147    shortq visc8p2v p140114n  R       1:27      1 compute13

Show specific job,¶

    squeue -j <JobID>

    $ squeue -j 123456

Show jobs in a specific partition,¶

    squeue -p <partition>

    $ squeue -p shortq

Show running job¶

    $ squeue -t R

Show pending job¶

    $ squeue -t PD

Job information provided		Description
JOBID	:	Job ID
PARTITION	:	Partition
NAME	:	Job name given
ST (status)	:
TIME	:	Running Time
NODELIST	:	List of the nodes which the job is using
NODELIST(REASON)	:	Show the reason that explain the current job status

Status	Description
R	Running
PD	Pending (queuing)
CD	Completed (exit code 0 — without error)
F	Failure (exit code non-zero)
DL	Failure (job terminated on deadline)

Reason	Description
Priority	The job is waiting for higher priority job(s) to complete
Dependency	The job is waiting for a dependent job to complete
Resources	The job is waiting for resources to become available
MaxJobsPerUser	Maximum number of jobs for users are in running

Delete / cancel a job¶

    $ scancel <JobID>

Delete / cancel all jobs for a user¶

    $ scancel -u <Username>

Update attributes of submitted jobs¶

Update walltime request of a queuing job (a job which is pending and not yet start to run) to 1 hour.

    $ scontrol update jobid=<JobID> TimeLimit=01:00:00

Check Partition/Node Usage¶

User can use command plist to check the status of partitions and nodes

    $ sinfo
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    testq        up    1:00:00     13    mix compute[01-03,06,11-12,15,19-20,23-25,27]
    testq        up    1:00:00     18  alloc compute[04-05,07-10,13-14,16-18,21-22,26,28-31]
    shortq       up   infinite     13    mix compute[01-03,06,11-12,15,19-20,23-25,27]
    shortq       up   infinite     18  alloc compute[04-05,07-10,13-14,16-18,21-22,26,28-31]
    mediumq      up   infinite     13    mix compute[01-03,06,11-12,15,19-20,23-25,27]
    mediumq      up   infinite     18  alloc compute[04-05,07-10,13-14,16-18,21-22,26,28-31]
    longq        up   infinite     13    mix compute[01-03,06,11-12,15,19-20,23-25,27]
    longq        up   infinite     18  alloc compute[04-05,07-10,13-14,16-18,21-22,26,28-31]
    testgpuq     up    1:00:00      1    mix compute32
    testgpuq     up    1:00:00      1  alloc compute33
    gpuq         up   infinite      1    mix compute32
    gpuq         up   infinite      1  alloc compute33