Difference between revisions of "HOWTO check up on jobs"

From HPC
m (Checking a specific job)
m
Line 4: Line 4:
  
 
<pre>
 
<pre>
[username@launch ~]$ qstat
+
[username@hpc1 ~]$ qstat
 
Job id            Name            User              Time Use S Queue
 
Job id            Name            User              Time Use S Queue
 
----------------  ---------------- ----------------  -------- - -----
 
----------------  ---------------- ----------------  -------- - -----
32.pbsserver      JobName          username          351:04:3 R long
+
32.hpc1          JobName          username          351:04:3 R long
33.pbsserver      JobName          username          351:06:1 R day
+
33.hpc1          JobName          username          351:06:1 R day
34.pbsserver      JobName          username          390:30:2 R week
+
34.hpc1          JobName          username          390:30:2 R week
40.pbsserver      JobName          username          496:38:2 R month
+
40.hpc1          JobName          username          496:38:2 R month
46.pbsserver      JobName          username          506:13:5 R long
+
46.hpc1          JobName          username          506:13:5 R long
 
</pre>
 
</pre>
  
Line 32: Line 32:
 
If you want to delete a job (whether it's already running or not), use the <code>qdel</code> command:
 
If you want to delete a job (whether it's already running or not), use the <code>qdel</code> command:
 
<pre>
 
<pre>
[username@launch ~]$ qdel 41
+
[username@hpc1 ~]$ qdel 41
 
</pre>
 
</pre>
  
Line 42: Line 42:
  
 
<pre>
 
<pre>
[username@launch ~]$ pestat
+
[username@hpc1 ~]$ pestat
 
Queues:  short day week month long
 
Queues:  short day week month long
 
Node            state    cpu        memory  jobids/users
 
Node            state    cpu        memory  jobids/users
Line 51: Line 51:
  
 
<pre>
 
<pre>
[username@launch ~]$ pestat -u username
+
[username@hpc1 ~]$ pestat -u username
 
Queues:  short day week month long
 
Queues:  short day week month long
 
Node            state    cpu        memory  jobids/users
 
Node            state    cpu        memory  jobids/users
Line 60: Line 60:
  
 
<pre>
 
<pre>
[username@launch ~]$ pestat -a -u username
+
[username@hpc1 ~]$ pestat -a -u username
 
Queues:  short day week month long
 
Queues:  short day week month long
 
Node            state    cpu        memory  jobids/users
 
Node            state    cpu        memory  jobids/users

Revision as of 14:53, 19 January 2017

Examining the queue

You can look at the queue by using the qstat command. qstat will display the queue ordered by JobID.

[username@hpc1 ~]$ qstat
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
32.hpc1           JobName          username          351:04:3 R long
33.hpc1           JobName          username          351:06:1 R day
34.hpc1           JobName          username          390:30:2 R week
40.hpc1           JobName          username          496:38:2 R month
46.hpc1           JobName          username          506:13:5 R long

Checking a specific job

If you want to see the details of a specific job, use qstat -fx <JobID> on it:

[username@hpc1 ~]$ qstat -fx 40

If you want to look at the output of your job while it's still running, use the qpeek command.

[username@hpc1 ~]$ qpeek 40

Deleting a job you no longer want

If you want to delete a job (whether it's already running or not), use the qdel command:

[username@hpc1 ~]$ qdel 41

There's no output on a successful job deletion. Keep in mind that when running jobs are killed, files in scratch space will not sync back to your home directory. Orphaned scratch space will be moved to /scratch2.

Overview of cluster usage

pestat gives a nice overview of which nodes are busy with which jobs for which users.

[username@hpc1 ~]$ pestat
Queues:  short day week month long
Node            state    cpu        memory   jobids/users
----                   tot used    tot used
comp001.hpc     busy     8    8    15G  51%  34
comp002.hpc     free    64   60   126G  12%  35 36 37 38
[username@hpc1 ~]$ pestat -u username
Queues:  short day week month long
Node            state    cpu        memory   jobids/users
----                   tot used    tot used
comp001.hpc     busy     8    8    15G  51%  34
comp002.hpc     free    64   60   126G  12%  38
[username@hpc1 ~]$ pestat -a -u username
Queues:  short day week month long
Node            state    cpu        memory   jobids/users
----                   tot used    tot used
comp001.hpc     busy     8    8    15G  51%  34 username
comp002.hpc     free    64   60   126G  12%  38 username