Difference between revisions of "Common errors"

From HPC
(New page: == SSH keys == When your job completes, any output that it generated is copied to your work directory on the head node via SSH. If you don't have SSH keys set up, it can't copy the output...)
 
Line 10: Line 10:
 
Post job file processing error; job 26 on host comp028/38+comp028/37+comp028/36+comp028/35+comp028/34+comp028/33+comp028/32+comp028/31+comp028/30+comp028/29+comp028/28+comp028/27+comp028/10+comp028/9+comp028/8
 
Post job file processing error; job 26 on host comp028/38+comp028/37+comp028/36+comp028/35+comp028/34+comp028/33+comp028/32+comp028/31+comp028/30+comp028/29+comp028/28+comp028/27+comp028/10+comp028/9+comp028/8
  
Unable to copy file /var/spool/torque/spool/26.OU to username@head002:/export/home/username/out
+
Unable to copy file /var/spool/torque/spool/26.OU to username@launch:/export/home/username/out
 
*** error from copy
 
*** error from copy
 
Permission denied (publickey,keyboard-interactive).
 
Permission denied (publickey,keyboard-interactive).
Line 18: Line 18:
 
Output retained on that host in: /var/spool/torque/undelivered/26.OU
 
Output retained on that host in: /var/spool/torque/undelivered/26.OU
  
Unable to copy file /var/spool/torque/spool/26.ER to username@head002:/export/home/username/err
+
Unable to copy file /var/spool/torque/spool/26.ER to username@launch:/export/home/username/err
 
*** error from copy
 
*** error from copy
 
Permission denied (publickey,keyboard-interactive).
 
Permission denied (publickey,keyboard-interactive).
Line 39: Line 39:
 
Each machine you then want to SSH to without a password then has to be added to the ''~/.ssh/known_hosts'' file by connecting to it manually.
 
Each machine you then want to SSH to without a password then has to be added to the ''~/.ssh/known_hosts'' file by connecting to it manually.
  
  $ ssh head002
+
  $ ssh launch
  $ ssh head002.sun.ac.za
+
  $ ssh launch.hpc

Revision as of 11:23, 30 January 2014

SSH keys

When your job completes, any output that it generated is copied to your work directory on the head node via SSH. If you don't have SSH keys set up, it can't copy the output without your password (which it never has, so it always fails). You'll get an email similar to the one below.

PBS Job Id: 26
Job Name:   myJOB
Exec host:  comp028/38+comp028/37+comp028/36+comp028/35+comp028/34+comp028/33+comp028/32+comp028/31+comp028/30+comp028/29+comp028/28+comp028/27+comp028/10+comp028/9+comp028/8
An error has occurred processing your job, see below.
Post job file processing error; job 26 on host comp028/38+comp028/37+comp028/36+comp028/35+comp028/34+comp028/33+comp028/32+comp028/31+comp028/30+comp028/29+comp028/28+comp028/27+comp028/10+comp028/9+comp028/8

Unable to copy file /var/spool/torque/spool/26.OU to username@launch:/export/home/username/out
*** error from copy
Permission denied (publickey,keyboard-interactive).

lost connection
*** end error output
Output retained on that host in: /var/spool/torque/undelivered/26.OU

Unable to copy file /var/spool/torque/spool/26.ER to username@launch:/export/home/username/err
*** error from copy
Permission denied (publickey,keyboard-interactive).

lost connection
*** end error output
Output retained on that host in: /var/spool/torque/undelivered/26.ER

To fix this, create a set of SSH keys.

$ ssh-keygen -t dsa

Accept all defaults, they're fine.

Then, have your own account trust your own keys. This will allow you to SSH from yourself to yourself without a password, no matter which node you're on.

$ cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys

Each machine you then want to SSH to without a password then has to be added to the ~/.ssh/known_hosts file by connecting to it manually.

$ ssh launch
$ ssh launch.hpc