Difference between revisions of "HOWTO check up on jobs"

Latest revision as of 10:39, 7 March 2017

Examining the queue

You can look at the queue by using the qstat command. qstat will display the queue ordered by JobID.

[username@hpc1 ~]$ qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
32.hpc1           JobName          username          351:04:3 R long
33.hpc1           JobName          username          351:06:1 R day
34.hpc1           JobName          username          390:30:2 R week
40.hpc1           JobName          username          496:38:2 R month
46.hpc1           JobName          username          506:13:5 R long

Checking a specific job

If you want to see the details of a specific job, use qstat -fx <JobID> on it:

[username@hpc1 ~]$ qstat -fx 40

If you want to look at the output of your job while it's still running, use the qpeek command.

[username@hpc1 ~]$ qpeek 40

Deleting a job you no longer want

If you want to delete a job (whether it's already running or not), use the qdel command:

[username@hpc1 ~]$ qdel 41

There's no output on a successful job deletion. Keep in mind that when running jobs are killed, files in scratch space will not sync back to your home directory. Orphaned scratch space will be moved to /orphans.

Overview of cluster usage

pestat gives a nice overview of which nodes are busy with which jobs for which users.

[username@hpc1 ~]$ pestat
Queues:  short day week month long
Node            state    cpu        memory   jobids/users
----                   tot used    tot used
comp001.hpc     busy     8    8    15G  51%  34
comp002.hpc     free    64   60   126G  12%  35 36 37 38

[username@hpc1 ~]$ pestat -u username
Queues:  short day week month long
Node            state    cpu        memory   jobids/users
----                   tot used    tot used
comp001.hpc     busy     8    8    15G  51%  34
comp002.hpc     free    64   60   126G  12%  38

[username@hpc1 ~]$ pestat -a -u username
Queues:  short day week month long
Node            state    cpu        memory   jobids/users
----                   tot used    tot used
comp001.hpc     busy     8    8    15G  51%  34 username
comp002.hpc     free    64   60   126G  12%  38 username

@@ Line 1: / Line 1: @@
 == Examining the queue ==
-MAUI is the software application that actually decides what resources your job will run on. You can look at the queue by either using the TORQUE <code>qstat</code> command, or by using the MAUI <code>showq</code> command. <code>qstat</code> will display the queue ordered by JobID, whereas <code>showq</code> will display jobs grouped by their state ("running," "idle," or "hold") then ordered by priority.
+You can look at the queue by using the <code>qstat</code> command. <code>qstat</code> will display the queue ordered by JobID.
 <pre>
-[username@launch ~]$ showq
+[username@hpc1 ~]$ qstat
-ACTIVE JOBS--------------------
+Job id            Name             User              Time Use S Queue
-JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME
+----------------  ---------------- ----------------  -------- - -----
+.hpc1           JobName          username          351:04:3 R long
-                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
+.hpc1           JobName          username          351:06:1 R day
-                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
+.hpc1           JobName          username          390:30:2 R week
-                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
+.hpc1           JobName          username          496:38:2 R month
-                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
+.hpc1           JobName          username          506:13:5 R long
-                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
-                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
-                 username    Running     1    22:07:22  Tue Jun 18 07:58:46
-Active Jobs       7 of    8 Processors Active (87.50%)
-of    2 Nodes Active      (100.00%)
-IDLE JOBS----------------------
-JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
-Idle Jobs
-BLOCKED JOBS----------------
-JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
-Total Jobs: 7   Active Jobs: 7   Idle Jobs: 0   Blocked Jobs: 0
 </pre>
 == Checking a specific job ==
-If you want to see the details of a specific job, use <code>checkjob</code> on it:
+If you want to see the details of a specific job, use <code>qstat -fx <JobID></code> on it:
 <pre>
-[username@launch ~]$ checkjob 40
+[username@hpc1 ~]$ qstat -fx 40
-checking job 40
-State: Running
-Creds:  user:username  group:users  class:batch  qos:DEFAULT
-WallTime: 00:09:18 of 1:00:00:00
-SubmitTime: Tue Jun 18 09:54:34
-  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)
-StartTime: Tue Jun 18 09:54:35
-Total Tasks: 1
-Req[0]  TaskCount: 1  Partition: DEFAULT
-Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
-Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
-NodeCount: 1
-Allocated Nodes:
-[test001:1]
-IWD: [NONE]  Executable:  [NONE]
-Bypass: 0  StartCount: 1
-PartitionMask: [ALL]
-Flags:       RESTARTABLE
-Reservation '40' (-00:09:10 -> 23:50:50  Duration: 1:00:00:00)
-PE:  1.00  StartPriority:  1
 </pre>
@@ Line 71: / Line 25: @@
 <pre>
-[username@launch ~]$ qpeek 40
+[username@hpc1 ~]$ qpeek 40
 </pre>
-== Asking MAUI when your job will probably start and finish ==
+== Deleting a job you no longer want ==
-If you want to see a time estimate for when you job will start, use the <code>showstart</code> command:
+If you want to delete a job (whether it's already running or not), use the <code>qdel</code> command:
 <pre>
-[username@launch ~]$ showstart 41
+[username@hpc1 ~]$ qdel 41
-job 41 requires 1 proc for 1:00:00:00
-Earliest start in         22:03:48 on Wed Jun 19 07:58:46
-Earliest completion in  1:22:03:48 on Thu Jun 20 07:58:46
-Best Partition: DEFAULT
 </pre>
+There's no output on a successful job deletion. Keep in mind that when running jobs are killed, '''files in scratch space will not sync back to your home directory'''. Orphaned scratch space will be moved to /orphans.
 == Overview of cluster usage ==
@@ Line 90: / Line 42: @@
 <pre>
-[username@launch ~]$ pestat
+[username@hpc1 ~]$ pestat
-Node                state  load    pmem ncpu   mem   resi usrs tasks NetMbit jobids/users
+Queues:  short day week month long
-comp001              excl   3.8    1877   4   5907    398  4/1    4      0!   34 username 36 username 38 username 40 username
+Node            state    cpu        memory   jobids/users
-comp002              excl   3.8    1877   4   5907    409  4/1    4      0    33 username 35 username 37 username 39 username
+----                   tot used    tot used
+comp001.hpc     busy     8    8    15G  51%  34
+comp002.hpc     free    64   60   126G  12%  35 36 37 38
+</pre>
+<pre>
+[username@hpc1 ~]$ pestat -u username
+Queues:  short day week month long
+Node            state    cpu        memory   jobids/users
+----                   tot used    tot used
+comp001.hpc     busy     8    8    15G  51%  34
+comp002.hpc     free    64   60   126G  12%  38
+</pre>
+<pre>
+[username@hpc1 ~]$ pestat -a -u username
+Queues:  short day week month long
+Node            state    cpu        memory   jobids/users
+----                   tot used    tot used
+comp001.hpc     busy     8    8    15G  51%  34 username
+comp002.hpc     free    64   60   126G  12%  38 username
 </pre>

Difference between revisions of "HOWTO check up on jobs"

Latest revision as of 10:39, 7 March 2017

Contents

Examining the queue

Checking a specific job

Deleting a job you no longer want

Overview of cluster usage

Navigation menu

Views

Personal tools

Navigation

Search

Tools