HOWTO check up on jobs

From HPC
Revision as of 10:57, 26 March 2014 by Cwmoller (talk | contribs) (Asking MAUI when your job will probably start and finish)

Examining the queue

MAUI is the software application that actually decides what resources your job will run on. You can look at the queue by either using the TORQUE qstat command, or by using the MAUI showq command. qstat will display the queue ordered by JobID, whereas showq will display jobs grouped by their state ("running," "idle," or "hold") then ordered by priority.

[username@launch ~]$ showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

33                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
34                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
35                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
36                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
37                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
38                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
39                 username    Running     1    22:07:22  Tue Jun 18 07:58:46

     7 Active Jobs       7 of    8 Processors Active (87.50%)
                         2 of    2 Nodes Active      (100.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


Total Jobs: 7   Active Jobs: 7   Idle Jobs: 0   Blocked Jobs: 0

Checking a specific job

If you want to see the details of a specific job, use checkjob on it:

[username@launch ~]$ checkjob 40


checking job 40

State: Running
Creds:  user:username  group:users  class:batch  qos:DEFAULT
WallTime: 00:09:18 of 1:00:00:00
SubmitTime: Tue Jun 18 09:54:34
  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

StartTime: Tue Jun 18 09:54:35
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
NodeCount: 1
Allocated Nodes:
[test001:1]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       RESTARTABLE

Reservation '40' (-00:09:10 -> 23:50:50  Duration: 1:00:00:00)
PE:  1.00  StartPriority:  1

If you want to look at the output of your job while it's still running, use the qpeek command.

[username@launch ~]$ qpeek 40

Deleting a job you no longer want

If you want to delete a job (whether it's already running or not), use the qdel command:

[username@launch ~]$ qdel 41

There's no output on a successful job deletion. Keep in mind that running jobs are killed, files in scratch space will not sync back to your home directory and that scratch space will not be cleaned. If you delete running jobs that use scratch space, please let the administrator know to check for dirty scratch spaces.

Asking MAUI when your job will probably start and finish

If you want to see a time estimate for when your job will start, use the showstart command:

[username@launch ~]$ showstart 41
job 41 requires 1 proc for 1:00:00:00
Earliest start in         22:03:48 on Wed Jun 19 07:58:46
Earliest completion in  1:22:03:48 on Thu Jun 20 07:58:46
Best Partition: DEFAULT

Overview of cluster usage

pestat gives a nice overview of which nodes are busy with which jobs for which users.

[username@launch ~]$ pestat
Node                state  load    pmem ncpu   mem   resi usrs tasks NetMbit jobids/users
comp001              excl   3.8    1877   4   5907    398  4/1    4      0!   34 username 36 username 38 username 40 username
comp002              excl   3.8    1877   4   5907    409  4/1    4      0    33 username 35 username 37 username 39 username