Difference between revisions of "HOWTO submit jobs"

From HPC
m (Setup)
m (Setup)
Line 843: Line 843:
 
# Create a cluster profile
 
# Create a cluster profile
 
#: If your PCT is installed and licensed correctly, you should see a dropdown named Parallel in your toolbar.
 
#: If your PCT is installed and licensed correctly, you should see a dropdown named Parallel in your toolbar.
 +
## Open the ''Parallel'' dropdown, and select ''Manage Cluster Profiles...'' ([[Media:MATLABPCT1.png|screenshot]])
 +
## Add a new '''Generic''' custom 3rd party cluster profile ([[Media:MATLABPCT2.png|screenshot]])
 +
## Rename the new cluster profile by right-clicking on it
 +
## Set the following values ([[Media:MATLABPCT3.png|screenshot]], [[Media:MATLABPCT4.png|screenshot]], [[Media:MATLABPCT5.png|screenshot]])
 +
##* '''Description''': HPC1
 +
##* '''NumWorkers''': 16
 +
##* '''ClusterMatlabRoot''': /apps/MATLAB/R2015a
 +
##* '''IndependentSubmitFcn''': {@independentSubmitFcn, 'hpc1.sun.ac.za', '/scratch2/'''user''''}
 +
##*: replace '''user''' with your own username
 +
##* '''CommunicationSubmitFcn''': {@communicatingSubmitFcn, 'hpc1.sun.ac.za', '/scratch2/'''user''''}
 +
##*: replace '''user''' with your own username
 +
##* '''OperatingSystem''': unix
 +
##* '''HasSharedFilesystem''': false
 +
##* '''GetJobStateFcn''': @getJobStateFcn
 +
##* '''DeleteJobFcn''': @deleteJobFcn
 +
##; All other values can be left at their default (or empty) values
 +
  
  

Revision as of 11:33, 2 October 2015

Submitting jobs

TORQUE comes with very complete man pages. Therefore, for complete documentation of TORQUE commands you are encouraged to type man pbs and go from there. Jobs are submitted using the qsub command. Type man qsub for information on the plethora of options that it offers.

Let's say I have an executable called "myprog". Let me try and submit it to TORQUE:

[username@launch ~]$ qsub myprog
qsub:  file must be an ascii script

Oops... That didn't work because qsub expects a shell script. Any shell should work, so use your favorite one. So I write a simple script called "myscript.sh"

#!/bin/bash
cd $PBS_O_WORKDIR
./myprog argument1 argument2

and then I submit it:

[username@launch ~]$ qsub myscript.sh
16.head.hpc

That worked! Note the use of the $PBS_O_WORKDIR environment variable. This is important, since by default TORQUE on our cluster will start executing the commands in your shell script from your home directory. To go to the directory in which you executed qsub, cd to $PBS_O_WORKDIR. There are several other useful TORQUE environment variables that we will encounter later.

Editing files

Editing files on the cluster can be done through a couple of different methods...

Native Editors

  • vim - The visual editor (vi) is the traditional Unix editor. However, it is not necessarily the most intuitive editor. That being the case, if you are unfamiliar with it, there is a vi tutorial, vimtutor.
  • pico - While pico is not installed on the system, nano is installed, and is a pico work-a-like.
  • nano - Nano has a good bit of on-screen help to make it easier to use.

External Editors

You can also use your favourite editor on your local machine and then transfer the files over to the HPC afterwards. One caveat to this is that files created on Windows machines usually contain unprintable characters which may be misinterpreted by Linux command interpreters (shells). If this happens, there is a utility called dos2unix that you can use to convert the text file from DOS/Windows formatting to Linux formatting.

$ dos2unix script.sub
dos2unix: converting file script.sub to UNIX format ...

Specifying job parameters

By default, any script you submit will run on a single processor for a maximum of 24 hours. The name of the job will be the name of the script, and it will not email you when it starts, finishes, or is interrupted. stdout and stderr are collected into separate files named after the job number. You can affect the default behavior of TORQUE by passing it parameters. These parameters can be specified on the command line or inside the shell script itself. For example, let's say I want to send stdout and stderr to a file that is different from the default:

[username@launch ~]$ qsub -e myprog.err -o myprog.out myscript.sh

Alternatively, I can actually edit myscript.sh to include these parameters. I can specify any TORQUE command line parameter I want in a line that begins with "#PBS":

#!/bin/bash
#PBS -e myprog.err
#PBS -o myprog.out
cd $PBS_O_WORKDIR
./myprog argument1 argument2

Now I just submit my modified script with no command-line arguments

[username@launch ~]$ qsub myscript.csh

Useful PBS parameters

Here is an example of a more involved script that requests only 1 hour of execution time, renames the job, and sends email when the job begins, ends, or aborts:

#!/bin/bash
 
# Name of my job:
#PBS -N My-Program
 
# Run for 1 hour:
#PBS -l walltime=1:00:00
 
# Where to write stderr:
#PBS -e myprog.err
 
# Where to write stdout: 
#PBS -o myprog.out
 
# Send me email when my job aborts, begins, or ends
#PBS -m abe
 
# This command switched to the directory from which the "qsub" command was run:
cd $PBS_O_WORKDIR
 
#  Now run my program
./myprog argument1 argument2
 
echo Done!

Some more useful PBS parameters:

  • -M: Specify your email address.
  • -j oe: merge standard output and standard error into standard output file.
  • -V: export all your environment variables to the batch job.
  • -I: run and interactive job (see below).

Once again, you are encouraged to consult the qsub manpage for more options.

Special concerns for running OpenMP programs

By default, PBS assigns you 1 core on 1 node. You can, however, run your job on up to 64 cores per node. Therefore, if you want to run an OpenMP program, you must specify the number of processors per node. This is done with the flag -l nodes=1:ppn=<cores> where <cores> is the number of OpenMP threads you wish to use. Keep in mind that you still must set the OMP_NUM_THREADS environment variable within your script, e.g.:

#!/bin/bash
#PBS -N My-OpenMP-Script
#PBS -l nodes=1:ppn=8
#PBS -l walltime=1:00:00
 
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=8
./MyOpenMPProgram

Jobs with large output files

Instead of a job submission like this:

#!/bin/bash
#PBS -V
#PBS -m a
#PBS -N massiveJob

cd $PBS_O_WORKDIR
myprogram -i /home/me/inputfile -o /home/me/outputfile

change it to something like this:

#!/bin/bash
#PBS -l nodes=1:ppn=1:scratch
#PBS -V
#PBS -m a
#PBS -N massiveJob

# make sure I'm the only one that can read my output
umask 0077
# create a temporary directory with a random name in /scratch
TMP=/scratch/${PBS_JOBID}
mkdir -p $TMP
echo "Temporary work dir: ${TMP}"

# copy the input files to $TMP
echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/"
/usr/bin/rsync -vax "${PBS_O_WORKDIR}/" ${TMP}/

cd $TMP

# write my output to my new temporary file
myprogram -i inputfile -o outputfile

# job done, copy everything back
echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/"
/usr/bin/rsync -vax ${TMP}/ "${PBS_O_WORKDIR}/"

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf ${TMP}

Any job that has to write massive amounts of data will benefit from the above. Take note of the :scratch that was added to the node request line. If you do not add that feature request to the script, your job may be assigned to a node without scratch space.

Using the PBS_NODEFILE for multi-threaded jobs

Until now, we have only dealt with serial jobs. In a serial job, your PBS script will automatically be executed on the target node assigned by the scheduler. If you asked for more than one node, however, your script will only execute on the first node of the set of nodes allocated to you. To access the remainder of the nodes, you must either use MPI or manually launch threads. But which nodes to run on? PBS gives you a list of nodes in a file at the location pointed to by the PBS_NODEFILE environment variable. In your shell script, you may thereby ascertain the nodes on which your job can run by looking at the file in the location specified by this variable:

#!/bin/bash
#PBS -l nodes=2:ppn=8
 
echo The nodefile for this job is stored at `echo $PBS_NODEFILE`
echo `cat $PBS_NODEFILE`

When you run this job, you should then get output similar to:

The nodefile for this job is stored at /var/spool/torque/aux/33.head.hpc
comp001.hpc
comp001.hpc
comp001.hpc
comp001.hpc
comp001.hpc
comp001.hpc
comp001.hpc
comp001.hpc
comp002.hpc
comp002.hpc
comp002.hpc
comp002.hpc
comp002.hpc
comp002.hpc
comp002.hpc
comp002.hpc

If you have an application that manually forks processes onto the nodes of your job, you are responsible for parsing the PBS_NODEFILE to determine which nodes those are. If we ever catch you running on nodes that are not yours, we will provide your name and contact info to the other HPC users whose jobs you have interfered with and let vigilante justice take its course.

MPI jobs also require you to feed the PBS_NODEFILE to mpirun.

Guidelines / Rules

  • Create a temporary working directory in /scratch, not /tmp
    • /tmp is reserved for use by the operating system, and is only 5GB in size.
    • Preferably specify /scratch/$PBS_JOBID in your submit script so that it's easy to associate scratch directories with their jobs.
    • Copy your input files to your scratch space and work on the data there. Avoid using your home directory as much as possible.
      • If you need more than about 500GB of scratch space, you can also use /scratch2. It's a lot slower than /scratch, so try to avoid that too.
    • Copy only your results back to your home directory. Input files that haven't changed don't need to be copied.
    • Erase your temporary working directory when you're done.
  • Secure your work from accidental deletion or contamination by disallowing other users access to your scratch directories
    • umask 0077 disallows access by all other users

Examples

ADF

ADF generates run files which are scripts which contain your data. Make sure to convert it to a UNIX file first using dos2unix, and remember to make it executable with chmod +x.


ADF script requesting 4 cores, on 1 node, -m selects to mail abort and end messages and -M is the email address to send to. Requests 1 week walltime.

#!/bin/bash
#PBS -N JobName
#PBS -l nodes=1:ppn=4:scratch
#PBS -l walltime=7:00:00:00
#PBS -m abe
#PBS -M username@sun.ac.za

INPUT=inputfile.run

# make sure I'm the only one that can read my output
umask 0077
TMP=/scratch/$PBS_JOBID
mkdir -p $TMP

if [ ! -d "$TMP" ]; then
	echo "Cannot create temporary directory. Disk probably full."
	exit 1
fi

cd $TMP

. /apps/adf/2014.04/adfrc.sh

# override ADF's scratch directory
export SCM_TMPDIR=$TMP

# override log file
export SCM_LOGFILE="$TMP/$PBS_JOBID.logfile"

# Submit job
$PBS_O_WORKDIR/$INPUT

# job done, copy everything back 
echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/"
/usr/bin/rsync -vax ${TMP}/ "${PBS_O_WORKDIR}/"

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf ${TMP}

Fluent

Fluent script requesting 4 cores, on 1 node, -m selects to mail abort and end messages and -M is the email address to send to. Requests 1 week walltime.

#!/bin/bash
#PBS -N JobName
#PBS -l nodes=1:ppn=4:scratch
#PBS -l walltime=7:00:00:00
#PBS -m abe
#PBS -e output.err
#PBS -o output.out
#PBS -M username@sun.ac.za

INPUT=inputfile.jou

# make sure I'm the only one that can read my output
umask 0077
TMP=/scratch/$PBS_JOBID
mkdir -p $TMP

if [ ! -d "$TMP" ]; then
	echo "Cannot create temporary directory. Disk probably full."
	exit 1
fi

# copy the input files to $TMP
echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/"
/usr/bin/rsync -vax "${PBS_O_WORKDIR}"/ ${TMP}/

cd $TMP

# choose version of FLUENT
#module load app/ansys150
module load app/ansys162

# Automatically calculate the number of processors
np=$(cat $PBS_NODEFILE | wc -l)

fluent 3d -pdefault -cnf=${PBS_NODEFILE} -mpi=intel -g -t$np -ssh -i $INPUT

# job done, copy everything back 
echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/"
/usr/bin/rsync -vax ${TMP}/ "${PBS_O_WORKDIR}/"

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf ${TMP}

CFX

CFX script requesting 4 cores, on 1 node, -m selects to mail abort and end messages and -M is the email address to send to. Requests 1 week walltime.

#!/bin/bash
#PBS -N JobName
#PBS -l nodes=1:ppn=4:scratch
#PBS -l walltime=7:00:00:00
#PBS -m abe
#PBS -e output.err
#PBS -o output.out
#PBS -M username@sun.ac.za

DEF=inputfile.def
INI=inputfile.ini

# make sure I'm the only one that can read my output
umask 0077
TMP=/scratch/$PBS_JOBID
mkdir -p $TMP

if [ ! -d "$TMP" ]; then
	echo "Cannot create temporary directory. Disk probably full."
	exit 1
fi

# copy the input files to $TMP
echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/"
/usr/bin/rsync -vax "${PBS_O_WORKDIR}"/ ${TMP}/

cd $TMP

module load app/ansys

# get list of processors
PAR=$(sed -e '{:q;N;s/\n/,/g;t q}' $PBS_NODEFILE)

cfx5solve -def $DEF -ini $INI -par-dist $PAR

# job done, copy everything back 
echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/"
/usr/bin/rsync -vax ${TMP}/ "${PBS_O_WORKDIR}/"

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf ${TMP}

Abaqus

Abaqus script requesting 4 cores, on 1 node, -m selects to mail abort and end messages and -M is the email address to send to. Uses system default walltime.

#!/bin/bash
#
#PBS -l nodes=1:ppn=4:scratch
#PBS -m ae
#PBS -M username@sun.ac.za

# the input file without the .inp extension
JOBNAME=xyz

# make sure I'm the only one that can read my output
umask 0077
TMP=/scratch/$PBS_JOBID
mkdir -p $TMP

if [ ! -d "$TMP" ]; then
	echo "Cannot create temporary directory. Disk probably full."
	exit 1
fi

# copy the input files to $TMP
echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/"
/usr/bin/rsync -vax "${PBS_O_WORKDIR}"/ ${TMP}/

cd $TMP

module load app/abaqus

# Automatically calculate the number of processors
np=$(cat $PBS_NODEFILE | wc -l)

abaqus job=$JOBNAME input=$JOBNAME.inp analysis cpus=$np scratch=$TMP interactive
wait

# job done, copy everything back 
echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/"
/usr/bin/rsync -vax ${TMP}/ "${PBS_O_WORKDIR}/"

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf ${TMP}

R

R script requesting 1 node in the 'intel' group, -m selects to mail abort, begin and end messages and -M is the email address to send to. Uses system default walltime.

#!/bin/bash

#PBS -l nodes=1:intel:ppn=1
#PBS -M username@sun.ac.za
#PBS -m abe

cd $PBS_O_WORKDIR

/apps/R/3.0.2/bin/R CMD BATCH script.R

CPMD

CPMD script requesting 8 cores on 1 node, -N names the job 'cmpd', -m selects to mail abort and end messages and -M is the email address to send to. CPMD runs with MPI which needs to be told which nodes it may use. The list of nodes it may use is given in $PBS_NODEFILE. Uses system default walltime.

#!/bin/sh
#PBS -N cpmd
#PBS -l nodes=1:ppn=8
#PBS -m ae
#PBS -M username@sun.ac.za

module load compilers/gcc-4.8.2
module load openmpi-x86_64

cd $PBS_O_WORKDIR

# Automatically calculate the number of processors
np=$(cat $PBS_NODEFILE | wc -l)

mpirun -np $np --hostfile $PBS_NODEFILE /apps/CPMD/3.17.1/cpmd.x xyz.inp > xyz.out

Gaussian

Gaussian has massive temporary files (.rwf file). Generally we don't care about this file afterward, so this script doesn't copy it from temporary storage after job completion. Requests 6 week walltime.

#!/bin/bash
#PBS -N SomeHecticallyChemicalName
#PBS -l nodes=1:ppn=8:scratch
#PBS -l mem=16Gb
#PBS -l walltime=42:00:00:00
#PBS -m abe
#PBS -e output.err
#PBS -o output.out
#PBS -M username@sun.ac.za

INPUT=input.cor

# make sure I'm the only one that can read my output
umask 0077
TMP=/scratch/$PBS_JOBID
TMP2=/scratch2/$PBS_JOBID
mkdir -p $TMP $TMP2

if [ ! -d "$TMP" ]; then
	echo "Cannot create temporary directory. Disk probably full."
	exit 1
fi

if [ ! -d "$TMP2" ]; then
	echo "Cannot create overflow temporary directory. Disk probably full."
	exit 1
fi

export GAUSS_SCRDIR=$TMP

# copy the input files to $TMP
echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/"
/usr/bin/rsync -vax "${PBS_O_WORKDIR}"/ ${TMP}/

cd $TMP

# make sure input file has %RWF line for specifying temporary storage
if [ -z "`/bin/grep ^%RWF ${INPUT}`" ]; then
	/bin/sed -i '1s/^/%RWF\n/' $INPUT
fi

# assign 100GB of local temporary storage for every 4 CPUs
MAXTMP=$(( $(/bin/cat $PBS_NODEFILE | /usr/bin/wc -l) * 100 / 4 ))

# update input file to use local temporary storage
/bin/sed -i -E "s|%RWF(.*)|%RWF=${TMP}/,${MAXTMP}GB,${TMP2}/1.rwf,500GB,${TMP2}/2.rwf,500GB,${TMP2}/3.rwf,500GB,${TMP2}/4.rwf,500GB,${TMP2}/,-1|g" ${TMP}/${INPUT}

. /apps/g09/bsd/g09.profile

/apps/g09/g09 $INPUT > output.log

# job done, copy everything except .rwf back 
echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/"
/usr/bin/rsync -vax --exclude=*.rwf ${TMP}/ "${PBS_O_WORKDIR}/"

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf ${TMP} ${TMP2}

This script also requires that the input file contains a line starting with %RWF. This is so that the script can update the input file to specify that only the first 250GB of the RWF be written to the compute node's local scratch space. Overflow is written to the scratch space on the storage server. Unfortunately the RWF files can grow in size to more than 1TB, and can fill the compute node's scratch space, choking out other jobs and dying itself.

pisoFOAM

pisoFOAM generates a lot of output, not all of which is useful. In this example we use crontab to schedule the deletion of unwanted output while the job runs. Requests 3 week walltime.

#!/bin/bash
#PBS -l nodes=1:ppn=8:scratch:ib
#PBS -l walltime=21:00:00:00
#PBS -m abe
#PBS -M username@sun.ac.za
#PBS -N pisoFoam
 
# make sure I'm the only one that can read my output
umask 0077
# create a temporary directory in /scratch
TMP=/scratch/$PBS_JOBID
/bin/mkdir $TMP
echo "Temporary work dir: ${TMP}"

if [ ! -d "$TMP" ]; then
	echo "Cannot create temporary directory. Disk probably full."
	exit 1
fi

# copy the input files to $TMP
echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/"
/usr/bin/rsync -vax "${PBS_O_WORKDIR}/" ${TMP}/

cd $TMP
 
# start crontab, delete unwanted files every 6 hours
/bin/echo "0 */6 * * * /bin/find ${TMP} -regextype posix-egrep -regex '(${TMP}/processor[0-9]+)/([^/]*)/((uniform/.*)|ddt.*|phi.*|.*_0.*)' -exec rm {} \\;" | /usr/bin/crontab

# Automatically calculate the number of processors
np=$(cat $PBS_NODEFILE | wc -l)

module load compilers/gcc-4.8.2
module load openmpi/1.6.5

export MPI_BUFFER_SIZE=200000000
 
export FOAM_INST_DIR=/apps/OpenFOAM
foamDotFile=${FOAM_INST_DIR}/OpenFOAM-2.2.2/etc/bashrc
[ -f $foamDotFile ] && . $foamDotFile

blockMesh
decomposePar
 
mpirun -np $np pisoFoam -parallel > ${PBS_O_WORKDIR}/output.log
 
# remove crontab entry
/usr/bin/crontab -r
 
# job done, copy everything back
echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/"
/usr/bin/rsync -vax --exclude "*_0.gz" --exclude "phi*.gz" --exclude "ddt*.gz" ${TMP}/ "${PBS_O_WORKDIR}/"
 
# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf ${TMP}

MSC Marc

Marc script requesting 1 core, on 1 node, -m selects to mail abort and end messages and -M is the email address to send to. Uses system default walltime.

#!/bin/bash
#PBS -N JobName
#PBS -l nodes=1:ppn=1:scratch
#PBS -m abe
#PBS -e output.err
#PBS -o output.out
#PBS -M username@sun.ac.za

INPUT=inputfile

# make sure I'm the only one that can read my output
umask 0077
TMP=/scratch/$PBS_JOBID
mkdir -p $TMP

if [ ! -d "$TMP" ]; then
	echo "Cannot create temporary directory. Disk probably full."
	exit 1
fi

# copy the input files to $TMP
echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/"
/usr/bin/rsync -vax "${PBS_O_WORKDIR}"/ ${TMP}/

cd $TMP

module load app/marc

# get number of processors assigned
NPS=`/bin/cat $PBS_NODEFILE | /usr/bin/wc -l`
HOSTS=hosts.$PBS_JOBID

[ -f $HOSTS ] && /bin/rm $HOSTS
# create hosts file
uniq -c $PBS_NODEFILE | while read np host; do
	/bin/echo "${host} ${np}" >> $HOSTS
done

if [ ${NPS} -gt 1 ]; then
	run_marc -j $INPUT -ver n -back n -ci n -cr n -nps $NPS -host $HOSTS
else
	run_marc -j $INPUT -ver n -back n -ci n -cr n
fi

# job done, copy everything back 
echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/"
/usr/bin/rsync -vax ${TMP}/ "${PBS_O_WORKDIR}/"

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf ${TMP}

mothur

mothur has massive data volumes, and therefore has to use local scratch space to avoid killing the file server. Requests 1 core on 1 node.

mothur's input can either be a file with all the commands to process listed, or the commands can be given on the commandline if prefixed with a #.

#!/bin/bash

#PBS -l nodes=1:ppn=1:scratch
#PBS -m ae
#PBS -M username@sun.ac.za

# make sure I'm the only one that can read my output
umask 0077
TMP=/scratch/$PBS_JOBID
mkdir -p $TMP

if [ ! -d "$TMP" ]; then
	echo "Cannot create temporary directory. Disk probably full."
	exit 1
fi

# copy the input files to $TMP
echo "Copying from ${PBS_O_WORKDIR}/ to ${TMP}/"
/usr/bin/rsync -vax "${PBS_O_WORKDIR}"/ ${TMP}/

cd $TMP

module load app/mothur

# Automatically calculate the number of processors
np=$(cat $PBS_NODEFILE | wc -l)

mothur inputfile

# could also put the commands on the command line
#mothur "#cluster.split(column=file.dist, name=file.names, large=T, processors=$np)"

# job done, copy everything back 
echo "Copying from ${TMP}/ to ${PBS_O_WORKDIR}/"
/usr/bin/rsync -vax ${TMP}/ "${PBS_O_WORKDIR}/"

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf ${TMP}

Hadoop

Hadoop is useful for sorting through massive amounts of data. In this example we read the input data into a distributed HDFS, and do a map/reduce. Upon completion the output is copied out of the HDFS to central storage. amd nodes are requested due to their large scratch space. The input and output data together should not exceed 1.5TB, so we request 1 node for every 750GB of input data. In this example we request 6 nodes for 4TB of input data.

Java example

#!/bin/bash

#PBS -V
#PBS -m abe
#PBS -l nodes=6:ppn=1:amd:scratch
#PBS -N hadoopDedupe
#PBS -M username@sun.ac.za
#PBS -m abe

# make sure I'm the only one that can read my output
umask 0077

# create a temporary directory in /scratch
TMP=/scratch/$PBS_JOBID
mkdir -p $TMP/logs

JAR=dedupe.jar
CLASS=za.ac.sun.hpc.dedupe
INPUT="${PBS_O_WORKDIR}/input"
OUTPUT="${PBS_O_WORKDIR}"

HADOOP_PREFIX=/apps/hadoop/2.4.1
JAVA_HOME=/usr/lib/jvm/java
HADOOP_CONF_DIR="${PBS_O_WORKDIR}/conf"

# copy the class to $TMP
cp "${HADOOP_PREFIX}/common/$JAR" $TMP

# create Hadoop configs
cp -a $HADOOP_PREFIX/conf $HADOOP_CONF_DIR

MASTER=`hostname`
uniq $PBS_NODEFILE > $HADOOP_CONF_DIR/slaves
echo $MASTER > $HADOOP_CONF_DIR/masters

sed -i "s|export JAVA_HOME=.*|export JAVA_HOME=$JAVA_HOME|g" $HADOOP_CONF_DIR/hadoop-env.sh
sed -i "s|<value>/scratch/.*</value>|<value>/scratch/$PBS_JOBID</value>|g" $HADOOP_CONF_DIR/{hdfs,core}-site.xml
sed -i "s|<value>.*:50090</value>|<value>$MASTER:50090</value>|g" $HADOOP_CONF_DIR/{hdfs,core}-site.xml
sed -i "s|hdfs://.*:|hdfs://$MASTER:|g" $HADOOP_CONF_DIR/core-site.xml
sed -i "s|.*export HADOOP_LOG_DIR.*|export HADOOP_LOG_DIR=$TMP/logs|g" $HADOOP_CONF_DIR/hadoop-env.sh
sed -i "s|.*export HADOOP_PID_DIR.*|export HADOOP_PID_DIR=$TMP|g" $HADOOP_CONF_DIR/hadoop-env.sh

# setup Hadoop services
. $HADOOP_CONF_DIR/hadoop-env.sh

$HADOOP_PREFIX/bin/hdfs namenode -format
$HADOOP_PREFIX/sbin/start-dfs.sh

# import data
$HADOOP_PREFIX/bin/hdfs dfs -mkdir /user
$HADOOP_PREFIX/bin/hdfs dfs -mkdir /user/$USER
$HADOOP_PREFIX/bin/hdfs dfs -put $INPUT input

cd $TMP

# run hadoop job
$HADOOP_PREFIX/bin/hadoop jar $JAR $CLASS input output

# retrieve output from Hadoop
mkdir -p "$OUTPUT"
$HADOOP_PREFIX/bin/hdfs dfs -get output "$OUTPUT"

# stop Hadoop services
$HADOOP_PREFIX/sbin/stop-dfs.sh

# retrieve logs
cp -a $TMP/logs "$PBS_O_WORKDIR"

# clear HDFS directories on all slaves
cat $HADOOP_CONF_DIR/slaves | while read slave; do
    ssh -n $slave "rm -rf $TMP"
done

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf $TMP

Third-party script example

#!/bin/bash

#PBS -V
#PBS -m abe
#PBS -l nodes=6:ppn=1:amd:scratch
#PBS -N hadoopDedupe
#PBS -M username@sun.ac.za
#PBS -m abe

# make sure I'm the only one that can read my output
umask 0077

# create a temporary directory in /scratch
TMP=/scratch/$PBS_JOBID
mkdir -p $TMP/logs

INPUT="${PBS_O_WORKDIR}/input"
OUTPUT="${PBS_O_WORKDIR}"
MAPPER="mapper.py"
REDUCER="reducer.py"

# copy the mapper and reducer to $TMP
cp "${PBS_O_WORKDIR}/$MAPPER" "${PBS_O_WORKDIR}/$REDUCER" $TMP

HADOOP_PREFIX=/apps/hadoop/2.4.1
JAVA_HOME=/usr/lib/jvm/java
HADOOP_CONF_DIR="${PBS_O_WORKDIR}/conf"

# create Hadoop configs
cp -a $HADOOP_PREFIX/conf $HADOOP_CONF_DIR

MASTER=`hostname`
uniq $PBS_NODEFILE > $HADOOP_CONF_DIR/slaves
echo $MASTER > $HADOOP_CONF_DIR/masters

sed -i "s|export JAVA_HOME=.*|export JAVA_HOME=$JAVA_HOME|g" $HADOOP_CONF_DIR/hadoop-env.sh
sed -i "s|<value>/scratch/.*</value>|<value>/scratch/$PBS_JOBID</value>|g" $HADOOP_CONF_DIR/{hdfs,core}-site.xml
sed -i "s|<value>.*:50090</value>|<value>$MASTER:50090</value>|g" $HADOOP_CONF_DIR/{hdfs,core}-site.xml
sed -i "s|hdfs://.*:|hdfs://$MASTER:|g" $HADOOP_CONF_DIR/core-site.xml
sed -i "s|.*export HADOOP_LOG_DIR.*|export HADOOP_LOG_DIR=$TMP/logs|g" $HADOOP_CONF_DIR/hadoop-env.sh
sed -i "s|.*export HADOOP_PID_DIR.*|export HADOOP_PID_DIR=$TMP|g" $HADOOP_CONF_DIR/hadoop-env.sh

# setup Hadoop services
. $HADOOP_CONF_DIR/hadoop-env.sh

$HADOOP_PREFIX/bin/hdfs namenode -format
$HADOOP_PREFIX/sbin/start-dfs.sh

# import data
$HADOOP_PREFIX/bin/hdfs dfs -mkdir /user
$HADOOP_PREFIX/bin/hdfs dfs -mkdir /user/$USER
$HADOOP_PREFIX/bin/hdfs dfs -put $INPUT input

cd $TMP

# run hadoop job
STREAM=$HADOOP_PREFIX/share/hadoop/tools/lib/hadoop-streaming-2.4.1.jar
$HADOOP_PREFIX/bin/hadoop jar $STREAM $OPTIONS -files $MAPPER,$REDUCER -mapper $MAPPER -reducer $REDUCER -input input -output output

# retrieve output from Hadoop
mkdir -p "$OUTPUT"
$HADOOP_PREFIX/bin/hdfs dfs -get output "$OUTPUT"

# stop Hadoop services
$HADOOP_PREFIX/sbin/stop-dfs.sh

# retrieve logs
cp -a $TMP/logs "$PBS_O_WORKDIR"

# clear HDFS directories on all slaves
cat $HADOOP_CONF_DIR/slaves | while read slave; do
    ssh -n $slave "rm -rf $TMP"
done

# delete my temporary files
[ $? -eq 0 ] && /bin/rm -rf $TMP

Programs that handle job submission differently

MATLAB

With MATLAB's Parallel Computing Toolbox (PCT), it's possible to submit your MATLAB code directly from your desktop to the HPC without writing submit scripts and submitting the job manually. See MathWorks for further details.

The HPC has a license to allow the use of 16 cores by MATLAB. MATLAB R2015a and all standard toolboxes are installed on the HPC, and any MATLAB product you are licensed for will be able to run on the HPC.

To be able to use the HPC for MATLAB, you will require a Parallel Computing Toolbox license on your desktop.

Setup

  1. Install the required scripts for a generic PBS cluster
    1. Copy all the files from MATLABROOT\toolbox\distcomp\examples\integration\pbs\nonshared to MATLABROOT\toolbox\local. MATLABROOT is the location where you installed MATLAB on your machine, most probably in C:\Program Files\MATLAB\R2015a or /usr/local/MATLAB/R2015a.
    2. Edit MATLABROOT\toolbox\local\independentSubmitFcn.m
      • Change line 122 by adding -l walltime=1:00:00:00
        additionalSubmitArgs = '-l walltime=1:00:00:00';
    3. Edit MATLABROOT\toolbox\local\communicatingSubmitFcn.m
      • Change line 117 by adding -l walltime=1:00:00:00
        additionalSubmitArgs = sprintf('-l nodes=%d:ppn=%d -l walltime=1:00:00:00', numberOfNodes, procsPerNode);
      The two changes are required to increase the default walltime on the HPC.
  2. Create a cluster profile
    If your PCT is installed and licensed correctly, you should see a dropdown named Parallel in your toolbar.
    1. Open the Parallel dropdown, and select Manage Cluster Profiles... (screenshot)
    2. Add a new Generic custom 3rd party cluster profile (screenshot)
    3. Rename the new cluster profile by right-clicking on it
    4. Set the following values (screenshot, screenshot, screenshot)
      • Description: HPC1
      • NumWorkers: 16
      • ClusterMatlabRoot: /apps/MATLAB/R2015a
      • IndependentSubmitFcn: {@independentSubmitFcn, 'hpc1.sun.ac.za', '/scratch2/user'}
        replace user with your own username
      • CommunicationSubmitFcn: {@communicatingSubmitFcn, 'hpc1.sun.ac.za', '/scratch2/user'}
        replace user with your own username
      • OperatingSystem: unix
      • HasSharedFilesystem: false
      • GetJobStateFcn: @getJobStateFcn
      • DeleteJobFcn: @deleteJobFcn
      All other values can be left at their default (or empty) values


work in progress 2015-10-02

AccelRys Materials Studio