Scorpion β€” HPC cluster of biology.tohoku.ac.jp

Usage

Getting Started

  1. Read through all the pages in this document.
  2. Prepare an SSH key pair on your local computer.
  3. Complete the online registration form.
  4. Wait for a while until you are added to the user mailing list.
  5. Login to the server: ssh USERNAME@scorpion.biology.tohoku.ac.jp

Notes

How to setup SSH keys

  1. Prepare UNIX-like OS such as macOS and Linux.

    • Windows users can setup Linux (Ubuntu) environment via WSL.

      In WSL terminal, create/open /etc/wsl.conf file (with a command like sudo nano /etc/wsl.conf) and add the following lines:

      [automount]
      options = "metadata"
      

      Then, restart WSL (or Windows) to enable the config above. This setting is required to set permissions with chmod command.

  2. Check if you already have a pair of SSH keys (id_ed25519 and id_ed25519.pub) in your ~/.ssh/ on your local computer:

    ls -al ~/.ssh
    

    If they exist, you can use them. Do not modify or overwrite them. If they do not exist, generate a pair with the following command:

    ssh-keygen -t ed25519
    # Generating public/private ed25519 key pair.
    # Enter file in which to save the key (~/.ssh/id_ed25519): <return>
    # Enter passphrase (empty for no passphrase):              <return>
    # Enter same passphrase again:                             <return>
    # Your identification has been saved in ~/.ssh/id_ed25519
    # Your public key has been saved in ~/.ssh/id_ed25519.pub
    

    You can just accept the default file name and empty passphrase. The generated keys can be reused for other purposes, e.g., NIG Supercomputer, GitHub, GitLab.

  3. Create ~/.ssh/config file (plain text, not directory) on your local computer, and write some lines as follows:

    Host scorpion scorpion.biology.tohoku.ac.jp
      Hostname scorpion.biology.tohoku.ac.jp
      User tamakino
    

    Replace tamakino with your user name on scorpion server (NOT the one on your local computer). You can decide your user name, but it should be short and lowercase alphabets without any space or special character.

  4. Check the created keys and config file:

    ls -al ~/.ssh
    # drwx------ 11 winston staff 374 Apr  4 10:00 ./
    # -rw-r--r--  1 winston staff 749 Apr  4 10:00 config
    # -rw-------  1 winston staff 399 Apr  4 10:00 id_ed25519
    # -rw-r--r--  1 winston staff  92 Apr  4 10:00 id_ed25519.pub
    

    The permissions of ~/.ssh and ~/.ssh/id_ed25519 must be 700 (drwx------) and 600 (-rw-------), respectively. Execute the following commands to set permissions correctly:

    chmod 700 ~/.ssh
    chmod 600 ~/.ssh/id_ed25519
    chmod 644 ~/.ssh/config
    

    Check ls -al ~/.ssh again.

  5. Copy and paste the whole content of the public key (NOT private key) to the online registration form. For example, pbcopy command is useful on macOS:

    cat ~/.ssh/id_ed25519.pub | pbcopy
    
  6. The administrator will notify you when your public key is registered to your ~/.ssh/authorized_keys on the server. Then you can login to scorpion with the following command:

    ssh scorpion
    # or
    ssh YOUR_USERNAME@scorpion.biology.tohoku.ac.jp
    

You can add another public key to ~/.ssh/authorized_keys by yourself so that you can login from your secondary PC. Do not submit user registration twice. Do not transfer private keys between computers.

Slurm workload manager

A quick guide to the most common Slurm commands. See https://slurm.schedmd.com/archive/slurm-23.11.4/ for details.

Check the system status

sinfo: View Slurm nodes and partitions.

sinfo -No "%10N %.13C %.6O %.7z %.8e %.8m %.6w %9P %.9T %E"
NODELIST   CPUS(A/I/O/T) CPU_LO   S:C:T FREE_MEM   MEMORY WEIGHT PARTITION     STATE REASON
scorpion       0/16/0/16   2.60   2:8:1     2171    10000    100 login          idle none
scorpion01     32/0/0/32  32.02  2:16:1    86621    85297      2 compute*  allocated none
scorpion02     40/0/0/40  40.01  2:20:1   379096   375592      1 compute*  allocated none
scorpion03     48/0/0/48  24.02  2:12:2   355001   375607      3 compute*  allocated none

squeue: View jobs in the scheduling queue.

squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               666   compute hello.sh    steve  R    1:23:45      1 scorpion01

scontrol: View or modify Slurm configuration and state.

scontrol show node
scontrol show partition

Manage your jobs

sacct: Displays accounting data for all jobs. See sacct --helpformat for available fields.

sacct -S now-14days -o JobID,JobName,User,State,ExitCode%6,Start,Elapsed,ReqCPUs%4,NCPUs%4,TotalCPU,ReqMem%6,MaxRSS%6
JobID           JobName      User      State ExitCo               Start    Elapsed ReqC NCPU   TotalCPU ReqMem MaxRSS
------------ ---------- --------- ---------- ------ ------------------- ---------- ---- ---- ---------- ------ ------
666            hello.sh     steve  COMPLETED    0:0 2025-09-18T23:43:12   00:00:30    1    1  00:00.014     1G
666.batch         batch            COMPLETED    0:0 2025-09-18T23:43:12   00:00:30    1    1  00:00.014          666K

Check active jobs in detail:

scontrol show job [<JOB_ID>]

scancel: Signal or cancel jobs.

scancel <JOB_ID>

Submit a job

sbatch: Submit a batch script to Slurm.

Create a batch script and run.

sbatch hello.sh
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G

date -Iseconds
echo "$(hostname):$(pwd)"
echo "PATH: $PATH"

echo "Hello, Slurm!"
echo "=================================="
echo "SLURMD_NODENAME: $SLURMD_NODENAME"
echo "SLURM_SUBMIT_DIR: $SLURM_SUBMIT_DIR"
echo "SLURM_JOB_ID: $SLURM_JOB_ID"
echo "SLURM_JOB_NAME: $SLURM_JOB_NAME"
echo "SLURM_JOB_START_TIME: $SLURM_JOB_START_TIME"
echo "SLURM_JOB_END_TIME: $SLURM_JOB_END_TIME"
echo "SLURM_JOB_CPUS_PER_NODE: $SLURM_JOB_CPUS_PER_NODE"
echo "SLURM_CPUS_PER_TASK: $SLURM_CPUS_PER_TASK"
echo "SLURM_MEM_PER_CPU: $SLURM_MEM_PER_CPU"
echo "SLURM_MEM_PER_NODE: $SLURM_MEM_PER_NODE"
echo "=================================="

# do something

# just for demonstration; try `scontrol show job` etc.
sleep 60

date -Iseconds

You can get various information about the job with output environment variables in batch scripts.

Options

https://slurm.schedmd.com/archive/slurm-23.11.4/sbatch.html#SECTION_OPTIONS

Options can be set in three ways, and the priority is in the order below:

  1. command line options passed to sbatch
  2. input environment variables
  3. #SBATCH directives in batch scripts (we recommend this for reproducibility)
-c, --cpus-per-task=<ncpus>
Request CPU cores per task.
--mem-per-cpu=<size>[units]
Request memory per CPU core. The default of our system is --mem-per-cpu=1G. See hardware page and sinfo -Nl command for available memory size.
Total memory for a job can be set with --mem=<size>[units].
-J, --job-name=<jobname>
Set the job name. The name of the script file is used by default.
-D, --chdir=<directory>
Set the working directory of the job. The default is $SLURM_SUBMIT_DIR where sbatch is called.
-o, --output=<filename_pattern>
Set the file path to redirect standard outout, relative to $SLURM_SUBMIT_DIR. The default is slurm-%j.out for single jobs and slurm-%A_%a.out for job arrays, where %j, %A, and %a are replaced by the job ID, array job ID, and the array index, respectively. See filename pattern.
-e, --error=<filename_pattern>
By default both standard output and standard error are directed to the same file.
-a, --array=<indexes>
Submit a job array (see below).

job array

https://slurm.schedmd.com/archive/slurm-23.11.4/job_array.html

Use a job array to run the same script many times in parallel. This is perfect for processing lots of data files.

sbatch params.sh
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --array=1-3

date -Iseconds
echo "$(hostname):$(pwd)"
echo "PATH: $PATH"

echo "Hello, Slurm job array!"
echo "=================================="
echo "SLURMD_NODENAME: $SLURMD_NODENAME"
echo "SLURM_SUBMIT_DIR: $SLURM_SUBMIT_DIR"
echo "SLURM_JOB_ID: $SLURM_JOB_ID"
echo "SLURM_JOB_NAME: $SLURM_JOB_NAME"
echo "SLURM_JOB_START_TIME: $SLURM_JOB_START_TIME"
echo "SLURM_JOB_END_TIME: $SLURM_JOB_END_TIME"
echo "SLURM_JOB_CPUS_PER_NODE: $SLURM_JOB_CPUS_PER_NODE"
echo "SLURM_CPUS_PER_TASK: $SLURM_CPUS_PER_TASK"
echo "SLURM_MEM_PER_CPU: $SLURM_MEM_PER_CPU"
echo "SLURM_MEM_PER_NODE: $SLURM_MEM_PER_NODE"
echo "=================================="
echo "SLURM_ARRAY_JOB_ID: $SLURM_ARRAY_JOB_ID"
echo "SLURM_ARRAY_TASK_ID: $SLURM_ARRAY_TASK_ID"
echo "SLURM_ARRAY_TASK_COUNT: $SLURM_ARRAY_TASK_COUNT"
echo "=================================="

PARAM_ARRAY=(alpha beta gamma)
PARAM=${PARAM_ARRAY[$SLURM_ARRAY_TASK_ID]}
echo "PARAM: $PARAM"
# do something with "$PARAM"

PADDED_ID=$(printf "%03d" $SLURM_ARRAY_TASK_ID)
INPUT_FILE="data_${PADDED_ID}.txt"
echo "INPUT_FILE: $INPUT_FILE"
# do something with "$INPUT_FILE"

# just for demonstration; try `scontrol show job` etc.
sleep 60

date -Iseconds

interactive job

https://slurm.schedmd.com/srun.html

An interactive job allows you to run commands or programs directly on a compute node.

You can log in to the compute node using srun --pty $SHELL:

username@scorpion:~$ hostname
scorpion
username@scorpion:~$ srun --cpus-per-task 4 --mem-per-cpu 5G --pty $SHELL
username@scorpion02:~$ hostname
scorpion02
username@scorpion02:~$ # do sometihing
username@scorpion02:~$ exit

Alternatively, you can reserve resources in advance with salloc and then run commands with srun:

salloc --cpus-per-task 4 --mem-per-cpu 5G
srun # do something
exit

jupyter notebook

  1. (Preparation) Add the following to your local machine’s ~/.ssh/config for ProxyJump. You can easily connect to the compute node if an SSH key pair is properly set up.

    Host scorpion01
      HostName scorpion01
      ProxyJump scorpion.biology.tohoku.ac.jp
    

    Do the same for the other nodes.

  2. Start Jupyter Notebook by interactive job:

    nohup srun jupyter notebook --no-browser >jupyter.log 2>&1 </dev/null &
    

    Specify resources as needed.

  3. Check the using node:

    squeue
    # JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    # 13649   compute  jupyter watamine  R       0:14      1 scorpion02
    
  4. Check the URL in jupyter.log:

    less jupyter.log
    
    [C 2025-09-29 15:02:56.702 ServerApp]
    
        To access the server, open this file in a browser:
            file:///misc/home/watamine/.local/share/jupyter/runtime/jpserver-610054-open.html
        Or copy and paste one of these URLs:
            http://localhost:8889/tree?token=2acc5ae6d208ebfd29ee64d86bf03f6e4dcbe7f9ceec964f
            http://127.0.0.1:8889/tree?token=2acc5ae6d208ebfd29ee64d86bf03f6e4dcbe7f9ceec964f
    

    The port 8889 is used this time. Port number may differ depending on the timing of the connection.

  5. On your local machine, create an SSH tunnel to the compute node via the login node.

    ssh -L 8889:localhost:8889 watamine@scorpion02
    

    Please change the port number and node name according to your connection.

  6. Open the link shown in the Jupyter.log on your web browser.

    http://localhost:8889/tree?token=xxxxxxxxxxxxxxxx
    

πŸ‘γ€€Since Jupyter is running in the background, it will continue to run even if your computer goes to sleep or your network connection is interrupted.

To terminate the session, please use the scancel command.