Difference between revisions of "HOWTO check up on jobs"

From HPC
m (New page: == Examining the queue == MAUI is the software application that actually decides what resources your job will run on. You can look at the queue by either using the TORQUE <code>qstat</cod...)
 
Line 4: Line 4:
  
 
<pre>
 
<pre>
[username@head002 ~]$ showq
+
[username@launch ~]$ showq
 
ACTIVE JOBS--------------------
 
ACTIVE JOBS--------------------
 
JOBNAME            USERNAME      STATE  PROC  REMAINING            STARTTIME
 
JOBNAME            USERNAME      STATE  PROC  REMAINING            STARTTIME
Line 37: Line 37:
  
 
<pre>
 
<pre>
[username@head002 ~]$ checkjob 40
+
[username@launch ~]$ checkjob 40
  
  
Line 72: Line 72:
 
If you want to see a time estimate for when you job will start, use the <code>showstart</code> command:
 
If you want to see a time estimate for when you job will start, use the <code>showstart</code> command:
 
<pre>
 
<pre>
[username@head002 ~]$ showstart 41
+
[username@launch ~]$ showstart 41
 
job 41 requires 1 proc for 1:00:00:00
 
job 41 requires 1 proc for 1:00:00:00
 
Earliest start in        22:03:48 on Wed Jun 19 07:58:46
 
Earliest start in        22:03:48 on Wed Jun 19 07:58:46
Line 84: Line 84:
  
 
<pre>
 
<pre>
[username@head002 ~]$ pestat
+
[username@launch ~]$ pestat
 
Node                state  load    pmem ncpu  mem  resi usrs tasks NetMbit jobids/users
 
Node                state  load    pmem ncpu  mem  resi usrs tasks NetMbit jobids/users
 
comp001              excl  3.8    1877  4  5907    398  4/1    4      0!  34 username 36 username 38 username 40 username
 
comp001              excl  3.8    1877  4  5907    398  4/1    4      0!  34 username 36 username 38 username 40 username
 
comp002              excl  3.8    1877  4  5907    409  4/1    4      0    33 username 35 username 37 username 39 username
 
comp002              excl  3.8    1877  4  5907    409  4/1    4      0    33 username 35 username 37 username 39 username
 
</pre>
 
</pre>

Revision as of 11:18, 30 January 2014

Examining the queue

MAUI is the software application that actually decides what resources your job will run on. You can look at the queue by either using the TORQUE qstat command, or by using the MAUI showq command. qstat will display the queue ordered by JobID, whereas showq will display jobs grouped by their state ("running," "idle," or "hold") then ordered by priority.

[username@launch ~]$ showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

33                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
34                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
35                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
36                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
37                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
38                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
39                 username    Running     1    22:07:22  Tue Jun 18 07:58:46

     7 Active Jobs       7 of    8 Processors Active (87.50%)
                         2 of    2 Nodes Active      (100.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME


Total Jobs: 7   Active Jobs: 7   Idle Jobs: 0   Blocked Jobs: 0

Checking a specific job

If you want to see the details of a specific job, use checkjob on it:

[username@launch ~]$ checkjob 40


checking job 40

State: Running
Creds:  user:username  group:users  class:batch  qos:DEFAULT
WallTime: 00:09:18 of 1:00:00:00
SubmitTime: Tue Jun 18 09:54:34
  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

StartTime: Tue Jun 18 09:54:35
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
NodeCount: 1
Allocated Nodes:
[test001:1]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       RESTARTABLE

Reservation '40' (-00:09:10 -> 23:50:50  Duration: 1:00:00:00)
PE:  1.00  StartPriority:  1

Asking MAUI when you job will probably start and finish

If you want to see a time estimate for when you job will start, use the showstart command:

[username@launch ~]$ showstart 41
job 41 requires 1 proc for 1:00:00:00
Earliest start in         22:03:48 on Wed Jun 19 07:58:46
Earliest completion in  1:22:03:48 on Thu Jun 20 07:58:46
Best Partition: DEFAULT

Overview of cluster usage

pestat gives a nice overview of which nodes are busy with which jobs for which users.

[username@launch ~]$ pestat
Node                state  load    pmem ncpu   mem   resi usrs tasks NetMbit jobids/users
comp001              excl   3.8    1877   4   5907    398  4/1    4      0!   34 username 36 username 38 username 40 username
comp002              excl   3.8    1877   4   5907    409  4/1    4      0    33 username 35 username 37 username 39 username