Difference between revisions of "HOWTO check up on jobs"
From HPC
m (New page: == Examining the queue == MAUI is the software application that actually decides what resources your job will run on. You can look at the queue by either using the TORQUE <code>qstat</cod...) |
m (→Deleting a job you no longer want) |
||
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Examining the queue == | == Examining the queue == | ||
− | + | You can look at the queue by using the <code>qstat</code> command. <code>qstat</code> will display the queue ordered by JobID. | |
<pre> | <pre> | ||
− | [username@ | + | [username@hpc1 ~]$ qstat |
− | + | Job id Name User Time Use S Queue | |
− | + | ---------------- ---------------- ---------------- -------- - ----- | |
− | + | 32.hpc1 JobName username 351:04:3 R long | |
− | + | 33.hpc1 JobName username 351:06:1 R day | |
− | + | 34.hpc1 JobName username 390:30:2 R week | |
− | + | 40.hpc1 JobName username 496:38:2 R month | |
− | + | 46.hpc1 JobName username 506:13:5 R long | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
</pre> | </pre> | ||
== Checking a specific job == | == Checking a specific job == | ||
− | If you want to see the details of a specific job, use <code> | + | If you want to see the details of a specific job, use <code>qstat -fx <JobID></code> on it: |
<pre> | <pre> | ||
− | [username@ | + | [username@hpc1 ~]$ qstat -fx 40 |
+ | </pre> | ||
+ | If you want to look at the output of your job while it's still running, use the <code>qpeek</code> command. | ||
− | + | <pre> | |
+ | [username@hpc1 ~]$ qpeek 40 | ||
+ | </pre> | ||
− | + | == Deleting a job you no longer want == | |
− | |||
− | |||
− | |||
− | |||
− | + | If you want to delete a job (whether it's already running or not), use the <code>qdel</code> command: | |
− | + | <pre> | |
+ | [username@hpc1 ~]$ qdel 41 | ||
+ | </pre> | ||
− | + | There's no output on a successful job deletion. Keep in mind that when running jobs are killed, '''files in scratch space will not sync back to your home directory'''. Orphaned scratch space will be moved to /orphans. | |
− | |||
− | |||
− | |||
− | |||
− | |||
+ | == Overview of cluster usage == | ||
− | + | <code>pestat</code> gives a nice overview of which nodes are busy with which jobs for which users. | |
− | |||
− | |||
− | |||
− | + | <pre> | |
− | + | [username@hpc1 ~]$ pestat | |
+ | Queues: short day week month long | ||
+ | Node state cpu memory jobids/users | ||
+ | ---- tot used tot used | ||
+ | comp001.hpc busy 8 8 15G 51% 34 | ||
+ | comp002.hpc free 64 60 126G 12% 35 36 37 38 | ||
</pre> | </pre> | ||
− | |||
− | |||
− | |||
<pre> | <pre> | ||
− | [username@ | + | [username@hpc1 ~]$ pestat -u username |
− | + | Queues: short day week month long | |
− | + | Node state cpu memory jobids/users | |
− | + | ---- tot used tot used | |
− | + | comp001.hpc busy 8 8 15G 51% 34 | |
+ | comp002.hpc free 64 60 126G 12% 38 | ||
</pre> | </pre> | ||
− | |||
− | |||
− | |||
− | |||
<pre> | <pre> | ||
− | [username@ | + | [username@hpc1 ~]$ pestat -a -u username |
− | Node | + | Queues: short day week month long |
− | comp001 | + | Node state cpu memory jobids/users |
− | comp002 | + | ---- tot used tot used |
+ | comp001.hpc busy 8 8 15G 51% 34 username | ||
+ | comp002.hpc free 64 60 126G 12% 38 username | ||
</pre> | </pre> |
Latest revision as of 10:39, 7 March 2017
Contents
Examining the queue
You can look at the queue by using the qstat
command. qstat
will display the queue ordered by JobID.
[username@hpc1 ~]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 32.hpc1 JobName username 351:04:3 R long 33.hpc1 JobName username 351:06:1 R day 34.hpc1 JobName username 390:30:2 R week 40.hpc1 JobName username 496:38:2 R month 46.hpc1 JobName username 506:13:5 R long
Checking a specific job
If you want to see the details of a specific job, use qstat -fx <JobID>
on it:
[username@hpc1 ~]$ qstat -fx 40
If you want to look at the output of your job while it's still running, use the qpeek
command.
[username@hpc1 ~]$ qpeek 40
Deleting a job you no longer want
If you want to delete a job (whether it's already running or not), use the qdel
command:
[username@hpc1 ~]$ qdel 41
There's no output on a successful job deletion. Keep in mind that when running jobs are killed, files in scratch space will not sync back to your home directory. Orphaned scratch space will be moved to /orphans.
Overview of cluster usage
pestat
gives a nice overview of which nodes are busy with which jobs for which users.
[username@hpc1 ~]$ pestat Queues: short day week month long Node state cpu memory jobids/users ---- tot used tot used comp001.hpc busy 8 8 15G 51% 34 comp002.hpc free 64 60 126G 12% 35 36 37 38
[username@hpc1 ~]$ pestat -u username Queues: short day week month long Node state cpu memory jobids/users ---- tot used tot used comp001.hpc busy 8 8 15G 51% 34 comp002.hpc free 64 60 126G 12% 38
[username@hpc1 ~]$ pestat -a -u username Queues: short day week month long Node state cpu memory jobids/users ---- tot used tot used comp001.hpc busy 8 8 15G 51% 34 username comp002.hpc free 64 60 126G 12% 38 username