Usage
Getting Started
- Read through all the pages in this document.
- Prepare an SSH key pair on your local computer.
- Complete the online registration form.
- Wait for a while until you are added to the user mailing list.
- Login to the server:
ssh USERNAME@scorpion.biology.tohoku.ac.jp
Notes
- Access:
- Feel free to post any question and request to the mailing list. It will help other users and improving this document. Do NOT contact the administrators personally.
- No graphical user interface (GUI) is available; all the operation has to be carried out with a command-line interface (CLI) over SSH connection. Basic knowledge of shell scripting is required.
The server is accessible only from the Tohoku University LAN. You may want to consider TAINS VPN for remote access.Now the server is temporarily accessible from everywhere. Please keep the URL secret to reduce the potential risk of attacks.
- Data Storage:
- 20GB disk space is allocated for each user. The size may be changed in the future.
- Do NOT think this system as a long-term storage service. Transfer output data to your local computer, and delete them from the server immediately after each job execution. Or, at least, always keep your data deletable on the server.
- It is recommended to use rsync to transfer/synchronize your files between your local computer and the server. Git is also useful to manage your scripts.
- Your home directory
~/
on the head node is shared with the compute nodes. You don’t have to care about data transfer between nodes within the system.
- Job execution:
- Do NOT execute programs directly on the head node.
All the computational tasks must be managed by the PBS job scheduler
(as detailed below).
Very small tasks in the following examples are the exceptions,
i.e., you can execute only these commands on the head node:
- Basic shell operation:
pwd
,cd
,ls
,cat
,mv
,rm
, etc. - File transfer:
rsync
,git
, etc. - Text editor:
vim
,emacs
,nano
, etc. - Compilation/installation of a small program:
gcc
,make
,cmake
,pip
, etc. - PBS command:
pbsnodes
,qstat
,qsub
, etc.
- Basic shell operation:
- Check the list of available softwares. You can install additional softwares into your home directory, or ask the administrator via the mailing list for system-wide installation.
- Do NOT execute programs directly on the head node.
All the computational tasks must be managed by the PBS job scheduler
(as detailed below).
Very small tasks in the following examples are the exceptions,
i.e., you can execute only these commands on the head node:
How to setup SSH keys
-
Prepare UNIX-like OS such as macOS and Linux.
-
Windows users can setup Linux (Ubuntu) environment via WSL.
In WSL terminal, create/open
/etc/wsl.conf
file (with a command likesudo nano /etc/wsl.conf
) and add the following lines:[automount] options = "metadata"
Then, restart WSL (or Windows) to enable the config above. This setting is required to set permissions with
chmod
command.
-
-
Check if you already have a pair of SSH keys (
id_ed25519
andid_ed25519.pub
) in your~/.ssh/
on your local computer:ls -al ~/.ssh
If they exist, you can use them. Do not modify or overwrite them. If they do not exist, generate a pair with the following command:
ssh-keygen -t ed25519 # Generating public/private ed25519 key pair. # Enter file in which to save the key (~/.ssh/id_ed25519): <return> # Enter passphrase (empty for no passphrase): <return> # Enter same passphrase again: <return> # Your identification has been saved in ~/.ssh/id_ed25519 # Your public key has been saved in ~/.ssh/id_ed25519.pub
You can just accept the default file name and empty passphrase. The generated keys can be reused for other purposes, e.g., NIG Supercomputer, GitHub, GitLab.
-
Create
~/.ssh/config
file (plain text, not directory) on your local computer, and write some lines as follows:Host scorpion scorpion.biology.tohoku.ac.jp Hostname scorpion.biology.tohoku.ac.jp User tamakino
Replace
tamakino
with your user name on scorpion server (NOT the one on your local computer). You can decide your user name, but it should be short and lowercase alphabets without any space or special character. -
Check the created keys and config file:
ls -al ~/.ssh # drwx------ 11 winston staff 374 Apr 4 10:00 ./ # -rw-r--r-- 1 winston staff 749 Apr 4 10:00 config # -rw------- 1 winston staff 399 Apr 4 10:00 id_ed25519 # -rw-r--r-- 1 winston staff 92 Apr 4 10:00 id_ed25519.pub
The permissions of
~/.ssh
and~/.ssh/id_ed25519
must be700
(drwx------
) and600
(-rw-------
), respectively. Execute the following commands to set permissions correctly:chmod 700 ~/.ssh chmod 600 ~/.ssh/id_ed25519 chmod 644 ~/.ssh/config
Check
ls -al ~/.ssh
again. -
Copy and paste the whole content of the public key (NOT private key) to the online registration form. For example,
pbcopy
command is useful on macOS:cat ~/.ssh/id_ed25519.pub | pbcopy
-
The administrator will notify you when your public key is registered to your
~/.ssh/authorized_keys
on the server. Then you can login to scorpion with the following command:ssh scorpion # or ssh YOUR_USERNAME@scorpion.biology.tohoku.ac.jp
You can add another public key to ~/.ssh/authorized_keys
by yourself so that you can login from your secondary PC.
Do not submit user registration twice.
Do not transfer private keys between computers.
PBS job scheduler
Read PBS User’s Guide.
Check ths system status
pbsnodes -aSj
Check the status of jobs
# List
qstat -x
# See the detail of a job
qstat -fx <PBS_JOBID>
# How to read qstat
man qstat
Delete a job
qdel <PBS_JOBID>
Submit a job
You can submit a job in several ways:
# stdin
echo "echo 'hello world!'; sleep 60" | qsub -N hello
# giving the full path to a program
qsub -N hello -- /bin/echo "hello world!"
# job script
qsub hello.sh
An example job script hello.sh
:
#!/bin/bash
#PBS -N hello
#PBS -l select=1:ncpus=1:mem=1gb
date -Iseconds
hostname
pwd
cd $PBS_O_WORKDIR
pwd
echo "Hello, world!"
sleep 60
date -Iseconds
An example of an array job array.sh
:
#!/bin/bash
#PBS -N array-ms
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -J 0-3
date -Iseconds
hostname
cd $PBS_O_WORKDIR
pwd
param_range=($(seq 5.0 0.5 6.5)) # (5.0, 5.5, 6.0, 6.5)
theta=${param_range[@]:${PBS_ARRAY_INDEX}:1}
ms 4 2 -t $theta
date -Iseconds
An equivalent job script in Python:
#!/usr/bin/env python3
#PBS -N array-ms-py
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -J 0-3
import os
import subprocess
import numpy as np
os.chdir(os.getenv('PBS_O_WORKDIR', '.'))
array_index = int(os.getenv('PBS_ARRAY_INDEX', '0'))
param_range = np.linspace(5.0, 6.5, 4) # [5.0, 5.5, 6.0, 6.5]
theta = param_range[array_index]
cmd = 'ms 4 2 -t {}'.format(theta)
proc = subprocess.run(cmd.split(), stdout=subprocess.PIPE)
print(proc.stdout.decode(), end='')
Useful options and environment variables:
-N jobname
- to set job’s name.
-o ***
,-e ***
- to specify the path for the standard output/error stream.
By default, they are writen to the current working directory where
qsub
was executed, i.e.,${PBS_O_WORKDIR}/${PBS_JOBNAME}.o<sequence_number>
-j oe
- to merge the standard error stream into the standard output stream.
It is equivalent to
2>&1
in shell redirection. -J 0-3
- to declare the job is an array job (with size 4 in this example).
A current index (
0
,1
, …) can be obtained viaPBS_ARRAY_INDEX
. -l ***
- to request PBS job scheduler to allocate resources for a job. A job has to wait in a queue until the requested resources are available.
- e.g.,
-l select=1:ncpus=4:mem=32gb:host=scorpion02
requests 4 CPU cores and 32GB RAM (in total, not per core) onscorpion02
node. - Note that it does not affect how a job script itself and programs run,
i.e., it does not automatically accelerate single-threaded jobs.
To achieve parallel execution using multiple CPU cores,
you need to write your script as such,
or to give an explicit option to each program like
blast -num_threads 4
,samtools -@ 4
,make -j4
, etc. -v VAR1=value,VAR2
- to export environment variables to the job.
-V
to export all the variables in the current shell environment.PBS_JOBID
- the ID of the job.
PBS_JOBNAME
- the name of the job.
PBS_O_WORKDIR
- the working directory where
qsub
was called. By default, stdout/stderr are copied to here, although jobs are executed in$HOME
. You may want tocd $PBS_O_WORKDIR
in many cases.
See man qsub
for more details.