HOWTO check up on jobs

From HPC
Revision as of 10:48, 26 March 2014 by Cwmoller (talk | contribs) (Asking MAUI when you job will probably start and finish)

Examining the queue

MAUI is the software application that actually decides what resources your job will run on. You can look at the queue by either using the TORQUE qstat command, or by using the MAUI showq command. qstat will display the queue ordered by JobID, whereas showq will display jobs grouped by their state ("running," "idle," or "hold") then ordered by priority.

[username@launch ~]$ showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

33                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
34                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
35                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
36                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
37                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
38                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
39                 username    Running     1    22:07:22  Tue Jun 18 07:58:46

     7 Active Jobs       7 of    8 Processors Active (87.50%)
                         2 of    2 Nodes Active      (100.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


Total Jobs: 7   Active Jobs: 7   Idle Jobs: 0   Blocked Jobs: 0

Checking a specific job

If you want to see the details of a specific job, use checkjob on it:

[username@launch ~]$ checkjob 40


checking job 40

State: Running
Creds:  user:username  group:users  class:batch  qos:DEFAULT
WallTime: 00:09:18 of 1:00:00:00
SubmitTime: Tue Jun 18 09:54:34
  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

StartTime: Tue Jun 18 09:54:35
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
NodeCount: 1
Allocated Nodes:
[test001:1]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       RESTARTABLE

Reservation '40' (-00:09:10 -> 23:50:50  Duration: 1:00:00:00)
PE:  1.00  StartPriority:  1

If you want to look at the output of your job while it's still running, use the qpeek command.

[username@launch ~]$ qpeek 40

Asking MAUI when your job will probably start and finish

If you want to see a time estimate for when you job will start, use the showstart command:

[username@launch ~]$ showstart 41
job 41 requires 1 proc for 1:00:00:00
Earliest start in         22:03:48 on Wed Jun 19 07:58:46
Earliest completion in  1:22:03:48 on Thu Jun 20 07:58:46
Best Partition: DEFAULT

Overview of cluster usage

pestat gives a nice overview of which nodes are busy with which jobs for which users.

[username@launch ~]$ pestat
Node                state  load    pmem ncpu   mem   resi usrs tasks NetMbit jobids/users
comp001              excl   3.8    1877   4   5907    398  4/1    4      0!   34 username 36 username 38 username 40 username
comp002              excl   3.8    1877   4   5907    409  4/1    4      0    33 username 35 username 37 username 39 username