HOWTO submit jobs
Contents
Submitting jobs
TORQUE comes with very complete man pages. Therefore, for complete documentation of TORQUE commands you are encouraged to type man pbs
and go from there. Jobs are submitted using the qsub
command. Type man qsub
for information on the plethora of options that it offers.
Let's say I have an executable called "myprog". Let me try and submit it to TORQUE:
[username@launch ~]$ qsub myprog qsub: file must be an ascii script
Oops... That didn't work because qsub expects a shell script. Any shell should work, so use your favorite one. So I write a simple script called "myscript.sh"
#!/bin/bash cd $PBS_O_WORKDIR ./myprog argument1 argument2
and then I submit it:
[username@launch ~]$ qsub myscript.sh 16.head.hpc
That worked! Note the use of the $PBS_O_WORKDIR
environment variable. This is important, since by default TORQUE on our cluster will start executing the commands in your shell script from your home directory. To go to the directory in which you executed qsub
, cd
to $PBS_O_WORKDIR
. There are several other useful TORQUE environment variables that we will encounter later.
Editing files
Editing files on the cluster can be done through a couple of different methods...
Native Editors
vim
- The visual editor (vi) is the traditional Unix editor. However, it is not necessarily the most intuitive editor. That being the case, if you are unfamiliar with it, there is a vi tutorial,vimtutor
.pico
- While pico is not installed on the system, nano is installed, and is a pico work-a-like.nano
- Nano has a good bit of on-screen help to make it easier to use.
External Editors
You can also use your favourite editor on your local machine and then transfer the files over to the HPC afterwards. One caveat to this is that files created on Windows machines usually contain unprintable characters which may be misinterpreted by Linux command interpreters (shells). If this happens, there is a utility called dos2unix
that you can use to convert the text file from DOS/Windows formatting to Linux formatting.
$ dos2unix script.sub dos2unix: converting file script.sub to UNIX format ...
Specifying job parameters
By default, any script you submit will run on a single processor for a maximum of 24 hours. The name of the job will be the name of the script, and it will not email you when it starts, finishes, or is interrupted. stdout and stderr are collected into separate files named after the job number. You can affect the default behavior of TORQUE by passing it parameters. These parameters can be specified on the command line or inside the shell script itself. For example, let's say I want to send stdout and stderr to a file that is different from the default:
[username@launch ~]$ qsub -e myprog.err -o myprog.out myscript.sh
Alternatively, I can actually edit myscript.sh to include these parameters. I can specify any TORQUE command line parameter I want in a line that begins with "#PBS":
#!/bin/bash #PBS -e myprog.err #PBS -o myprog.out cd $PBS_O_WORKDIR ./myprog argument1 argument2
Now I just submit my modified script with no command-line arguments
[username@launch ~]$ qsub myscript.csh
Useful PBS parameters
Here is an example of a more involved script that requests only 1 hour of execution time, renames the job, and sends email when the job begins, ends, or aborts:
#!/bin/bash # Name of my job: #PBS -N My-Program # Run for 1 hour: #PBS -l walltime=1:00:00 # Where to write stderr: #PBS -e myprog.err # Where to write stdout: #PBS -o myprog.out # Send me email when my job aborts, begins, or ends #PBS -m abe # This command switched to the directory from which the "qsub" command was run: cd $PBS_O_WORKDIR # Now run my program ./myprog argument1 argument2 echo Done!
Some more useful PBS parameters:
- -M: Specify your email address.
- -j oe: merge standard output and standard error into standard output file.
- -V: export all your environment variables to the batch job.
- -I: run and interactive job (see below).
Once again, you are encouraged to consult the qsub manpage for more options.
Special concerns for running OpenMP programs
By default, PBS assigns you 1 core on 1 node. You can, however, run your job on up to 64 cores per node. Therefore, if you want to run an OpenMP program, you must specify the number of processors per node. This is done with the flag -l nodes=1:ppn=<cores>
where <cores>
is the number of OpenMP threads you wish to use.
Keep in mind that you still must set the OMP_NUM_THREADS environment variable within your script, e.g.:
#!/bin/bash #PBS -N My-OpenMP-Script #PBS -l nodes=1:ppn=8 #PBS -l walltime=1:00:00 cd $PBS_O_WORKDIR export OMP_NUM_THREADS=8 ./MyOpenMPProgram
Jobs with large output files
Instead of a job submission like this:
#!/bin/bash #PBS -V #PBS -m a #PBS -N massiveJob cd $PBS_O_WORKDIR myprogram -i /home/me/inputfile -o /home/me/outputfile
change it to something like this:
#!/bin/bash #PBS -V #PBS -m a #PBS -N massiveJob # make sure I'm the only one that can read my output umask 0077 # create a temporary directory with a random name in /scratch TMP=/scratch/${PBS_JOBID} mkdir -p $TMP echo "Temporary work dir: ${TMP}" # copy the input files to $TMP echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/" /usr/bin/rsync -vax "${PBS_O_WORKDIR}/" ${TMP}/ cd $TMP # write my output to my new temporary file myprogram -i inputfile -o outputfile # job done, copy everything back echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/" /usr/bin/rsync -vax ${TMP}/ "${PBS_O_WORKDIR}/" # delete my temporary files /bin/rm -rf ${TMP}
Any job that has to write massive amounts of data will benefit from the above.
Using the PBS_NODEFILE for multi-threaded jobs
Until now, we have only dealt with serial jobs. In a serial job, your PBS script will automatically be executed on the target node assigned by the scheduler. If you asked for more than one node, however, your script will only execute on the first node of the set of nodes allocated to you. To access the remainder of the nodes, you must either use MPI or manually launch threads. But which nodes to run on? PBS gives you a list of nodes in a file at the location pointed to by the PBS_NODEFILE
environment variable.
In your shell script, you may thereby ascertain the nodes on which your job can run by looking at the file in the location specified by this variable:
#!/bin/bash #PBS -l nodes=2:ppn=8 echo The nodefile for this job is stored at `echo $PBS_NODEFILE` echo `cat $PBS_NODEFILE`
When you run this job, you should then get output similar to:
The nodefile for this job is stored at /var/spool/torque/aux/33.head.hpc comp001.hpc comp001.hpc comp001.hpc comp001.hpc comp001.hpc comp001.hpc comp001.hpc comp001.hpc comp002.hpc comp002.hpc comp002.hpc comp002.hpc comp002.hpc comp002.hpc comp002.hpc comp002.hpc
If you have an application that manually forks processes onto the nodes of your job, you are responsible for parsing the PBS_NODEFILE
to determine which nodes those are. If we ever catch you running on nodes that are not yours, we will provide your name and contact info to the other HPC users whose jobs you have interfered with and let vigilante justice take its course.
MPI jobs also require you to feed the PBS_NODEFILE
to mpirun
.
Examples
Fluent
Fluent script requesting 4 cores, on 1 node, -m selects to mail abort and end messages and -M is the email address to send to.
#!/bin/bash # #PBS -l nodes=1:ppn=4 #PBS -m ae #PBS -M username@sun.ac.za cd $PBS_O_WORKDIR module load app/ansys # Automatically calculate the number of processors np=$(cat $PBS_NODEFILE | wc -l) fluent 3d -pdefault -cnf=${PBS_NODEFILE} -mpi=intel -g -t$np -ssh -i xyz.jou
Abaqus
Abaqus script requesting 4 cores, on 1 node, -m selects to mail abort and end messages and -M is the email address to send to.
#!/bin/bash # #PBS -l nodes=1:ppn=4 #PBS -m ae #PBS -M username@sun.ac.za cd $PBS_O_WORKDIR # the input file without the .inp extension JOBNAME=xyz # Automatically calculate the number of processors np=$(cat $PBS_NODEFILE | wc -l) /apps/Abaqus/Commands/abaqus job=$JOBNAME analysis cpus=$np interactive wait
R
R script requesting 1 node in the 'intel' group, -m selects to mail abort, begin and end messages and -M is the email address to send to.
#!/bin/bash #PBS -l nodes=1:intel #PBS -M username@sun.ac.za #PBS -m abe cd $PBS_O_WORKDIR /apps/R/3.0.2/bin/R CMD BATCH script.R
CPMD
CPMD script requesting 8 cores on 1 node, -N names the job 'cmpd', -m selects to mail abort and end messages and -M is the email address to send to. CPMD runs with MPI which needs to be told which nodes it may use. The list of nodes it may use is given in $PBS_NODEFILE
.
#!/bin/sh #PBS -N cpmd #PBS -l nodes=1:ppn=8 #PBS -m ae #PBS -M username@sun.ac.za module load compilers/gcc-4.8.2 module load openmpi-x86_64 cd $PBS_O_WORKDIR # Automatically calculate the number of processors np=$(cat $PBS_NODEFILE | wc -l) mpirun -np $np --hostfile $PBS_NODEFILE /apps/CPMD/3.17.1/cpmd.x xyz.inp > xyz.out
Gaussian
Gaussian has massive temporary files (.rwf file). Generally we don't care about this file afterward, so this script doesn't copy it from temporary storage after job completion.
#!/bin/bash #PBS -N SomeHecticallyChemicalName #PBS -l nodes=1:ppn=8 #PBS -m abe #PBS -e output.err #PBS -o output.out #PBS -M username@sun.ac.za INPUT=input.cor # make sure I'm the only one that can read my output umask 0077 TMP=/scratch/$PBS_JOBID TMP2=$HOME/tmp/$PBS_JOBID mkdir -p $TMP $TMP2 if [ ! -d "$TMP" ]; then echo "Cannot create temporary directory. Disk probably full." exit 1 fi if [ ! -d "$TMP2" ]; then echo "Cannot create overflow temporary directory. Quota probably full." exit 1 fi export GAUSS_SCRDIR=$TMP # copy the input files to $TMP echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/" /usr/bin/rsync -vax "${PBS_O_WORKDIR}"/ ${TMP}/ cd $TMP # make sure input file has %RWF line for specifying temporary storage if [ -z "`/bin/grep ^%RWF ${INPUT}`" ]; then /bin/sed -i '1s/^/%RWF\n/' $INPUT fi # assign 125GB of local temporary storage for every 4 CPUs MAXTMP=$(( $(/bin/cat $PBS_NODEFILE | /usr/bin/wc -l) * 125 / 4 )) # update input file to use local temporary storage /bin/sed -i -E "s|%RWF(.*)|%RWF=${TMP}/,${MAXTMP}GB,${TMP2}/,-1|g" ${TMP}/${INPUT} . /apps/g09/bsd/g09.profile /apps/g09/g09 $INPUT > output.log # job done, copy everything except .rwf back echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/" /usr/bin/rsync -vax --exclude=*.rwf ${TMP}/ "${PBS_O_WORKDIR}/" # delete my temporary files /bin/rm -rf ${TMP} ${TMP2}
This script also requires that the input file contains a line starting with %RWF. This is so that the script can update the input file to specify that only the first 200GB of the RWF be written to the compute node's local scratch space. Overflow is written to the user's home directory. Unfortunately the RWF files can grow in size to more than 1TB, and can fill the compute node's scratch space, choking out other jobs and dying itself.
pisoFOAM
pisoFOAM generates a lot of output, not all of which is useful. In this example we use iwatch to delete unwanted output while the job runs.
#!/bin/bash #PBS -l nodes=1:ppn=8:ib #PBS -m abe #PBS -M username@sun.ac.za #PBS -N pisoFoam # make sure I'm the only one that can read my output umask 0077 # create a temporary directory in /scratch TMP=/scratch/$PBS_JOBID /bin/mkdir $TMP echo "Temporary work dir: ${TMP}" if [ ! -d "$TMP" ]; then echo "Cannot create temporary directory. Disk probably full." exit 1 fi # copy the input files to $TMP echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/" /usr/bin/rsync -vax "${PBS_O_WORKDIR}/" ${TMP}/ cd $TMP # start file watcher /bin/cp /usr/local/etc/iwatch.xml $TMP/iwatch.xml /bin/sed -i "s|WATCH|<path type='recursive' alert='off' events='close' exec='/bin/rm -rf %f' filter='((.*)_0\|phi(.*)\|ddt(.*))\\.gz\$'>${TMP}</path>|g" $TMP/iwatch.xml /usr/local/bin/iwatch -p $TMP/iwatch.pid -d -f $TMP/iwatch.xml # Automatically calculate the number of processors np=$(cat $PBS_NODEFILE | wc -l) module load compilers/gcc-4.8.2 module load openmpi/1.6.5 export MPI_BUFFER_SIZE=200000000 export FOAM_INST_DIR=/apps/OpenFOAM foamDotFile=${FOAM_INST_DIR}/OpenFOAM-2.2.2/etc/bashrc [ -f $foamDotFile ] && . $foamDotFile blockMesh decomposePar mpirun -np $np pisoFoam -parallel > ${PBS_O_WORKDIR}/output.log # kill the watcher /bin/kill `cat $TMP/iwatch.pid` /bin/rm $TMP/iwatch.pid # job done, copy everything back echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/" /usr/bin/rsync -vax --exclude "*_0.gz" --exclude "phi*.gz" --exclude "ddt*.gz" ${TMP}/ "${PBS_O_WORKDIR}/" # delete my temporary files /bin/rm -rf ${TMP}