Difference between revisions of "HOWTO check up on jobs"

From HPC
m (Asking MAUI when your job will probably start and finish)
m
Line 1: Line 1:
 
== Examining the queue ==
 
== Examining the queue ==
  
MAUI is the software application that actually decides what resources your job will run on. You can look at the queue by either using the TORQUE <code>qstat</code> command, or by using the MAUI <code>showq</code> command. <code>qstat</code> will display the queue ordered by JobID, whereas <code>showq</code> will display jobs grouped by their state ("running," "idle," or "hold") then ordered by priority.
+
You can look at the queue by using the <code>qstat</code> command. <code>qstat</code> will display the queue ordered by JobID.
  
 
<pre>
 
<pre>
[username@launch ~]$ showq
+
[username@launch ~]$ qstat
ACTIVE JOBS--------------------
+
Job id            Name            User              Time Use S Queue
JOBNAME            USERNAME      STATE  PROC  REMAINING            STARTTIME
+
----------------  ---------------- ---------------- -------- - -----
 
+
32.pbsserver     JobName          username          351:04:3 R long
33                username    Running    1    22:07:22  Tue Jun 18 07:58:46
+
33.pbsserver      JobName          username          351:06:1 R day
34                username    Running    1    22:07:22  Tue Jun 18 07:58:46
+
34.pbsserver      JobName          username          390:30:2 R week
35                username    Running    1    22:07:22  Tue Jun 18 07:58:46
+
40.pbsserver      JobName          username          496:38:2 R month
36                username    Running    1    22:07:22  Tue Jun 18 07:58:46
+
46.pbsserver      JobName          username          506:13:5 R long
37                username    Running    1    22:07:22  Tue Jun 18 07:58:46
 
38                username    Running    1    22:07:22  Tue Jun 18 07:58:46
 
39                username    Running    1    22:07:22  Tue Jun 18 07:58:46
 
 
 
    7 Active Jobs      7 of    8 Processors Active (87.50%)
 
                        2 of    2 Nodes Active      (100.00%)
 
 
 
IDLE JOBS----------------------
 
JOBNAME            USERNAME      STATE  PROC    WCLIMIT            QUEUETIME
 
 
 
 
 
0 Idle Jobs
 
 
 
BLOCKED JOBS----------------
 
JOBNAME            USERNAME     STATE  PROC    WCLIMIT            QUEUETIME
 
 
 
 
 
Total Jobs: 7  Active Jobs: 7  Idle Jobs: 0  Blocked Jobs: 0
 
 
</pre>
 
</pre>
  
 
== Checking a specific job ==
 
== Checking a specific job ==
  
If you want to see the details of a specific job, use <code>checkjob</code> on it:
+
If you want to see the details of a specific job, use <code>qstat -f <JobID></code> on it:
  
 
<pre>
 
<pre>
[username@launch ~]$ checkjob 40
+
[username@launch ~]$ qstat -f 40
 
 
 
 
checking job 40
 
 
 
State: Running
 
Creds:  user:username  group:users  class:batch  qos:DEFAULT
 
WallTime: 00:09:18 of 1:00:00:00
 
SubmitTime: Tue Jun 18 09:54:34
 
  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)
 
 
 
StartTime: Tue Jun 18 09:54:35
 
Total Tasks: 1
 
 
 
Req[0]  TaskCount: 1  Partition: DEFAULT
 
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
 
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
 
NodeCount: 1
 
Allocated Nodes:
 
[test001:1]
 
 
 
 
 
IWD: [NONE]  Executable:  [NONE]
 
Bypass: 0  StartCount: 1
 
PartitionMask: [ALL]
 
Flags:      RESTARTABLE
 
 
 
Reservation '40' (-00:09:10 -> 23:50:50  Duration: 1:00:00:00)
 
PE:  1.00  StartPriority:  1
 
 
</pre>
 
</pre>
  
Line 82: Line 36:
  
 
There's no output on a successful job deletion. Keep in mind that running jobs are killed, '''files in scratch space will not sync back to your home directory''' and that '''scratch space will not be cleaned'''. If you delete running jobs that use scratch space, please let the administrator know to check for dirty scratch spaces.
 
There's no output on a successful job deletion. Keep in mind that running jobs are killed, '''files in scratch space will not sync back to your home directory''' and that '''scratch space will not be cleaned'''. If you delete running jobs that use scratch space, please let the administrator know to check for dirty scratch spaces.
 
+
<!--
== Asking MAUI when your job will probably start and finish ==
 
 
 
If you want to see a time estimate for when your job will start, use the <code>showstart</code> command:
 
<pre>
 
[username@launch ~]$ showstart 41
 
job 41 requires 1 proc for 1:00:00:00
 
Earliest start in        22:03:48 on Wed Jun 19 07:58:46
 
Earliest completion in  1:22:03:48 on Thu Jun 20 07:58:46
 
Best Partition: DEFAULT
 
</pre>
 
 
 
 
== Overview of cluster usage ==
 
== Overview of cluster usage ==
  
Line 104: Line 47:
 
comp002              excl  3.8    1877  4  5907    409  4/1    4      0    33 username 35 username 37 username 39 username
 
comp002              excl  3.8    1877  4  5907    409  4/1    4      0    33 username 35 username 37 username 39 username
 
</pre>
 
</pre>
 +
-->

Revision as of 14:09, 12 January 2016

Examining the queue

You can look at the queue by using the qstat command. qstat will display the queue ordered by JobID.

[username@launch ~]$ qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
32.pbsserver      JobName          username          351:04:3 R long
33.pbsserver      JobName          username          351:06:1 R day
34.pbsserver      JobName          username          390:30:2 R week
40.pbsserver      JobName          username          496:38:2 R month
46.pbsserver      JobName          username          506:13:5 R long

Checking a specific job

If you want to see the details of a specific job, use qstat -f <JobID> on it:

[username@launch ~]$ qstat -f 40

If you want to look at the output of your job while it's still running, use the qpeek command.

[username@launch ~]$ qpeek 40

Deleting a job you no longer want

If you want to delete a job (whether it's already running or not), use the qdel command:

[username@launch ~]$ qdel 41

There's no output on a successful job deletion. Keep in mind that running jobs are killed, files in scratch space will not sync back to your home directory and that scratch space will not be cleaned. If you delete running jobs that use scratch space, please let the administrator know to check for dirty scratch spaces.