Difference between revisions of "HOWTO check up on jobs"
From HPC
m (New page: == Examining the queue == MAUI is the software application that actually decides what resources your job will run on. You can look at the queue by either using the TORQUE <code>qstat</cod...) |
|||
Line 4: | Line 4: | ||
<pre> | <pre> | ||
− | [username@ | + | [username@launch ~]$ showq |
ACTIVE JOBS-------------------- | ACTIVE JOBS-------------------- | ||
JOBNAME USERNAME STATE PROC REMAINING STARTTIME | JOBNAME USERNAME STATE PROC REMAINING STARTTIME | ||
Line 37: | Line 37: | ||
<pre> | <pre> | ||
− | [username@ | + | [username@launch ~]$ checkjob 40 |
Line 72: | Line 72: | ||
If you want to see a time estimate for when you job will start, use the <code>showstart</code> command: | If you want to see a time estimate for when you job will start, use the <code>showstart</code> command: | ||
<pre> | <pre> | ||
− | [username@ | + | [username@launch ~]$ showstart 41 |
job 41 requires 1 proc for 1:00:00:00 | job 41 requires 1 proc for 1:00:00:00 | ||
Earliest start in 22:03:48 on Wed Jun 19 07:58:46 | Earliest start in 22:03:48 on Wed Jun 19 07:58:46 | ||
Line 84: | Line 84: | ||
<pre> | <pre> | ||
− | [username@ | + | [username@launch ~]$ pestat |
Node state load pmem ncpu mem resi usrs tasks NetMbit jobids/users | Node state load pmem ncpu mem resi usrs tasks NetMbit jobids/users | ||
comp001 excl 3.8 1877 4 5907 398 4/1 4 0! 34 username 36 username 38 username 40 username | comp001 excl 3.8 1877 4 5907 398 4/1 4 0! 34 username 36 username 38 username 40 username | ||
comp002 excl 3.8 1877 4 5907 409 4/1 4 0 33 username 35 username 37 username 39 username | comp002 excl 3.8 1877 4 5907 409 4/1 4 0 33 username 35 username 37 username 39 username | ||
</pre> | </pre> |
Revision as of 11:18, 30 January 2014
Contents
Examining the queue
MAUI is the software application that actually decides what resources your job will run on. You can look at the queue by either using the TORQUE qstat
command, or by using the MAUI showq
command. qstat
will display the queue ordered by JobID, whereas showq
will display jobs grouped by their state ("running," "idle," or "hold") then ordered by priority.
[username@launch ~]$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 33 username Running 1 22:07:22 Tue Jun 18 07:58:46 34 username Running 1 22:07:22 Tue Jun 18 07:58:46 35 username Running 1 22:07:22 Tue Jun 18 07:58:46 36 username Running 1 22:07:22 Tue Jun 18 07:58:46 37 username Running 1 22:07:22 Tue Jun 18 07:58:46 38 username Running 1 22:07:22 Tue Jun 18 07:58:46 39 username Running 1 22:07:22 Tue Jun 18 07:58:46 7 Active Jobs 7 of 8 Processors Active (87.50%) 2 of 2 Nodes Active (100.00%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 0 Idle Jobs BLOCKED JOBS---------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 7 Active Jobs: 7 Idle Jobs: 0 Blocked Jobs: 0
Checking a specific job
If you want to see the details of a specific job, use checkjob
on it:
[username@launch ~]$ checkjob 40 checking job 40 State: Running Creds: user:username group:users class:batch qos:DEFAULT WallTime: 00:09:18 of 1:00:00:00 SubmitTime: Tue Jun 18 09:54:34 (Time Queued Total: 00:00:01 Eligible: 00:00:01) StartTime: Tue Jun 18 09:54:35 Total Tasks: 1 Req[0] TaskCount: 1 Partition: DEFAULT Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] NodeCount: 1 Allocated Nodes: [test001:1] IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 1 PartitionMask: [ALL] Flags: RESTARTABLE Reservation '40' (-00:09:10 -> 23:50:50 Duration: 1:00:00:00) PE: 1.00 StartPriority: 1
Asking MAUI when you job will probably start and finish
If you want to see a time estimate for when you job will start, use the showstart
command:
[username@launch ~]$ showstart 41 job 41 requires 1 proc for 1:00:00:00 Earliest start in 22:03:48 on Wed Jun 19 07:58:46 Earliest completion in 1:22:03:48 on Thu Jun 20 07:58:46 Best Partition: DEFAULT
Overview of cluster usage
pestat
gives a nice overview of which nodes are busy with which jobs for which users.
[username@launch ~]$ pestat Node state load pmem ncpu mem resi usrs tasks NetMbit jobids/users comp001 excl 3.8 1877 4 5907 398 4/1 4 0! 34 username 36 username 38 username 40 username comp002 excl 3.8 1877 4 5907 409 4/1 4 0 33 username 35 username 37 username 39 username