Common errors

From HPC
Revision as of 09:10, 7 August 2013 by Cwmoller (talk | contribs) (New page: == SSH keys == When your job completes, any output that it generated is copied to your work directory on the head node via SSH. If you don't have SSH keys set up, it can't copy the output...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

SSH keys

When your job completes, any output that it generated is copied to your work directory on the head node via SSH. If you don't have SSH keys set up, it can't copy the output without your password (which it never has, so it always fails). You'll get an email similar to the one below.

PBS Job Id: 26
Job Name:   myJOB
Exec host:  comp028/38+comp028/37+comp028/36+comp028/35+comp028/34+comp028/33+comp028/32+comp028/31+comp028/30+comp028/29+comp028/28+comp028/27+comp028/10+comp028/9+comp028/8
An error has occurred processing your job, see below.
Post job file processing error; job 26 on host comp028/38+comp028/37+comp028/36+comp028/35+comp028/34+comp028/33+comp028/32+comp028/31+comp028/30+comp028/29+comp028/28+comp028/27+comp028/10+comp028/9+comp028/8

Unable to copy file /var/spool/torque/spool/26.OU to username@head002:/export/home/username/out
*** error from copy
Permission denied (publickey,keyboard-interactive).

lost connection
*** end error output
Output retained on that host in: /var/spool/torque/undelivered/26.OU

Unable to copy file /var/spool/torque/spool/26.ER to username@head002:/export/home/username/err
*** error from copy
Permission denied (publickey,keyboard-interactive).

lost connection
*** end error output
Output retained on that host in: /var/spool/torque/undelivered/26.ER

To fix this, create a set of SSH keys.

$ ssh-keygen -t dsa

Accept all defaults, they're fine.

Then, have your own account trust your own keys. This will allow you to SSH from yourself to yourself without a password, no matter which node you're on.

$ cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys

Each machine you then want to SSH to without a password then has to be added to the ~/.ssh/known_hosts file by connecting to it manually.

$ ssh head002
$ ssh head002.sun.ac.za