Scorpion — HPC cluster of biology.tohoku.ac.jp

Usage

Getting Started

  1. Read through all the pages in this document.
  2. Prepare an SSH key pair on your local computer.
  3. Complete the online registration form.
  4. Wait for a while until you are added to the user mailing list.
  5. Login to the server: ssh USERNAME@scorpion.biology.tohoku.ac.jp

Notes

How to setup SSH keys

  1. Prepare UNIX-like OS such as macOS and Linux.

    • Windows users can setup Linux (Ubuntu) environment via WSL.

      In WSL terminal, create/open /etc/wsl.conf file (with a command like sudo nano /etc/wsl.conf) and add the following lines:

      [automount]
      options = "metadata"
      

      Then, restart WSL (or Windows) to enable the config above. This setting is required to set permissions with chmod command.

  2. Generate SSH keys on the terminal of your local computer with the following command:

    mkdir ~/.ssh
    ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_ed25519_scorpion
    
  3. Create ~/.ssh/config file (plain text, not directory) on your local computer, and write some lines as follows:

    Host scorpion scorpion.biology.tohoku.ac.jp
      Hostname scorpion.biology.tohoku.ac.jp
      IdentityFile ~/.ssh/id_ed25519_scorpion
      User tamakino
    

    Replace tamakino with your user name on scorpion server (NOT the one on your local computer). You can decide your user name, but it should be short and lowercase alphabets without any space or special character.

  4. Check the created keys and config file:

    ls -al ~/.ssh
    # drwx------ 11 winston staff 374 Apr  4 10:00 ./
    # -rw-r--r--  1 winston staff 749 Apr  4 10:00 config
    # -rw-------  1 winston staff 399 Apr  4 10:00 id_ed25519_scorpion
    # -rw-r--r--  1 winston staff  92 Apr  4 10:00 id_ed25519_scorpion.pub
    

    The permissions of ~/.ssh and ~/.ssh/id_ed25519_scorpion must be 700 (drwx------) and 600 (-rw-------), respectively. Execute the following commands to set permissions correctly:

    chmod 700 ~/.ssh
    chmod 600 ~/.ssh/id_ed25519_scorpion
    chmod 644 ~/.ssh/config
    

    Check ls -al ~/.ssh again.

  5. Copy and paste the whole content of the public key (NOT private key) to the online registration form. For example, pbcopy command is useful on macOS:

    cat ~/.ssh/id_ed25519_scorpion.pub | pbcopy
    
  6. The administrator will notify you when your public key is registered to your ~/.ssh/authorized_keys on the server. Then you can login to scorpion with the following command:

    ssh scorpion
    # or
    ssh YOUR_USERNAME@scorpion.biology.tohoku.ac.jp
    

You can add another public key to ~/.ssh/authorized_keys by yourself so that you can login from your secondary PC. Do not submit user registration twice.

PBS job scheduler

Read PBS User’s Guide.

Check ths system status

pbsnodes -aSj

Check the status of jobs

# List
qstat -x

# See the detail of a job
qstat -fx <PBS_JOBID>

# How to read qstat
man qstat

Delete a job

qdel <PBS_JOBID>

Submit a job

You can submit a job in several ways:

# stdin
echo "echo 'hello world!'; sleep 60" | qsub -N hello

# giving the full path to a program
qsub -N hello -- /bin/echo "hello world!"

# job script
qsub hello.sh

An example job script hello.sh:

#!/bin/bash
#PBS -N hello
#PBS -l select=1:ncpus=1:mem=1gb
date -Iseconds
hostname
pwd
cd $PBS_O_WORKDIR
pwd

echo "Hello, world!"
sleep 60
date -Iseconds

An example of an array job array.sh:

#!/bin/bash
#PBS -N array-ms
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -J 0-3
date -Iseconds
hostname
cd $PBS_O_WORKDIR
pwd

param_range=($(seq 5.0 0.5 6.5))  # (5.0, 5.5, 6.0, 6.5)
theta=${param_range[@]:${PBS_ARRAY_INDEX}:1}
ms 4 2 -t $theta

date -Iseconds

An equivalent job script in Python:

#!/usr/bin/env python3
#PBS -N array-ms-py
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -J 0-3
import os
import subprocess
import numpy as np

os.chdir(os.getenv('PBS_O_WORKDIR', '.'))
array_index = int(os.getenv('PBS_ARRAY_INDEX', '0'))
param_range = np.linspace(5.0, 6.5, 4)  # [5.0, 5.5, 6.0, 6.5]
theta = param_range[array_index]
cmd = 'ms 4 2 -t {}'.format(theta)
proc = subprocess.run(cmd.split(), stdout=subprocess.PIPE)
print(proc.stdout.decode(), end='')

Useful options and environment variables:

-N jobname
to set job’s name.
-o ***, -e ***
to specify the path for the standard output/error stream. By default, they are writen to the current working directory where qsub was executed, i.e., ${PBS_O_WORKDIR}/${PBS_JOBNAME}.o<sequence_number>
-j oe
to merge the standard error stream into the standard output stream. It is equivalent to 2>&1 in shell redirection.
-J 0-3
to declare the job is an array job (with size 4 in this example). A current index (0, 1, …) can be obtained via PBS_ARRAY_INDEX.
-l ***
to request PBS job scheduler to allocate resources for a job. A job has to wait in a queue until the requested resources are available.
e.g., -l select=1:ncpus=4:mem=32gb:host=scorpion02 requests 4 CPU cores and 32GB RAM (in total, not per core) on scorpion02 node.
Note that it does not affect how a job script itself and programs run, i.e., it does not automatically accelerate single-threaded jobs. To achieve parallel execution using multiple CPU cores, you need to write your script as such, or to give an explicit option to each program like blast -num_threads 4, samtools -@ 4, make -j4, etc.
-v VAR1=value,VAR2
to export environment variables to the job.
-V to export all the variables in the current shell environment.
PBS_JOBID
the ID of the job.
PBS_JOBNAME
the name of the job.
PBS_O_WORKDIR
the working directory where qsub was called. By default, stdout/stderr are copied to here, although jobs are executed in $HOME. You may want to cd $PBS_O_WORKDIR in many cases.

See man qsub for more details.