Usage
Getting Started
- Read through all the pages in this document.
- Prepare an SSH key pair on your local computer.
- Complete the online registration form.
- Wait for a while until you are added to the user mailing list.
- Login to the server:
ssh USERNAME@scorpion.biology.tohoku.ac.jp
Notes
- Access:
- Feel free to post any question and request to the mailing list. It will help other users and improving this document. Do NOT contact the administrators personally.
- No graphical user interface (GUI) is available; all the operation has to be carried out with a command-line interface (CLI) over SSH connection. Basic knowledge of shell scripting is required.
The server is accessible only from the Tohoku University LAN. You may want to consider TAINS VPN for remote access.Now the server is temporarily accessible from everywhere. Please keep the URL secret to reduce the potential risk of attacks.
- Data Storage:
- 20GB disk space is allocated for each user. The size may be changed in the future.
- Do NOT think this system as a long-term storage service. Transfer output data to your local computer, and delete them from the server immediately after each job execution. Or, at least, always keep your data deletable on the server.
- It is recommended to use rsync to transfer/synchronize your files between your local computer and the server. Git is also useful to manage your scripts.
- Your home directory
~/on the head node is shared with the compute nodes. You don’t have to care about data transfer between nodes within the system.
- Job execution:
- Do NOT execute programs directly on the head node.
All the computational tasks must be managed by the
Slurm workload manager (as detailed below).
Very small tasks in the following examples are the exceptions,
i.e., you can execute only these commands on the head node:
- Basic shell operation:
pwd,cd,ls,cat,mv,rm, etc. - File transfer:
rsync,git, etc. - Text editor:
vim,emacs,nano, etc. - Compilation/installation of a small program:
gcc,make,cmake,pip, etc. - Slurm command:
sinfo,sacct,sbatch, etc.
- Basic shell operation:
- Check the list of available softwares. You can install additional softwares into your home directory, or ask the administrator via the mailing list for system-wide installation.
- Do NOT execute programs directly on the head node.
All the computational tasks must be managed by the
Slurm workload manager (as detailed below).
Very small tasks in the following examples are the exceptions,
i.e., you can execute only these commands on the head node:
How to setup SSH keys
-
Prepare UNIX-like OS such as macOS and Linux.
-
Windows users can setup Linux (Ubuntu) environment via WSL.
In WSL terminal, create/open
/etc/wsl.conffile (with a command likesudo nano /etc/wsl.conf) and add the following lines:[automount] options = "metadata"Then, restart WSL (or Windows) to enable the config above. This setting is required to set permissions with
chmodcommand.
-
-
Check if you already have a pair of SSH keys (
id_ed25519andid_ed25519.pub) in your~/.ssh/on your local computer:ls -al ~/.sshIf they exist, you can use them. Do not modify or overwrite them. If they do not exist, generate a pair with the following command:
ssh-keygen -t ed25519 # Generating public/private ed25519 key pair. # Enter file in which to save the key (~/.ssh/id_ed25519): <return> # Enter passphrase (empty for no passphrase): <return> # Enter same passphrase again: <return> # Your identification has been saved in ~/.ssh/id_ed25519 # Your public key has been saved in ~/.ssh/id_ed25519.pubYou can just accept the default file name and empty passphrase. The generated keys can be reused for other purposes, e.g., NIG Supercomputer, GitHub, GitLab.
-
Create
~/.ssh/configfile (plain text, not directory) on your local computer, and write some lines as follows:Host scorpion scorpion.biology.tohoku.ac.jp Hostname scorpion.biology.tohoku.ac.jp User tamakinoReplace
tamakinowith your user name on scorpion server (NOT the one on your local computer). You can decide your user name, but it should be short and lowercase alphabets without any space or special character. -
Check the created keys and config file:
ls -al ~/.ssh # drwx------ 11 winston staff 374 Apr 4 10:00 ./ # -rw-r--r-- 1 winston staff 749 Apr 4 10:00 config # -rw------- 1 winston staff 399 Apr 4 10:00 id_ed25519 # -rw-r--r-- 1 winston staff 92 Apr 4 10:00 id_ed25519.pubThe permissions of
~/.sshand~/.ssh/id_ed25519must be700(drwx------) and600(-rw-------), respectively. Execute the following commands to set permissions correctly:chmod 700 ~/.ssh chmod 600 ~/.ssh/id_ed25519 chmod 644 ~/.ssh/configCheck
ls -al ~/.sshagain. -
Copy and paste the whole content of the public key (NOT private key) to the online registration form. For example,
pbcopycommand is useful on macOS:cat ~/.ssh/id_ed25519.pub | pbcopy -
The administrator will notify you when your public key is registered to your
~/.ssh/authorized_keyson the server. Then you can login to scorpion with the following command:ssh scorpion # or ssh YOUR_USERNAME@scorpion.biology.tohoku.ac.jp
You can add another public key to ~/.ssh/authorized_keys by yourself so that you can login from your secondary PC.
Do not submit user registration twice.
Do not transfer private keys between computers.
Slurm workload manager
A quick guide to the most common Slurm commands. See https://slurm.schedmd.com/archive/slurm-23.11.4/ for details.
Check the system status
sinfo:
View Slurm nodes and partitions.
sinfo -No "%10N %.13C %.6O %.7z %.8e %.8m %.6w %9P %.9T %E"
NODELIST CPUS(A/I/O/T) CPU_LO S:C:T FREE_MEM MEMORY WEIGHT PARTITION STATE REASON
scorpion 0/16/0/16 2.60 2:8:1 2171 10000 100 login idle none
scorpion01 32/0/0/32 32.02 2:16:1 86621 85297 2 compute* allocated none
scorpion02 40/0/0/40 40.01 2:20:1 379096 375592 1 compute* allocated none
scorpion03 48/0/0/48 24.02 2:12:2 355001 375607 3 compute* allocated none
squeue:
View jobs in the scheduling queue.
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
666 compute hello.sh steve R 1:23:45 1 scorpion01
scontrol:
View or modify Slurm configuration and state.
scontrol show node
scontrol show partition
Manage your jobs
sacct:
Displays accounting data for all jobs.
See sacct --helpformat for available fields.
sacct -S now-14days -o JobID,JobName,User,State,ExitCode%6,Start,Elapsed,ReqCPUs%4,NCPUs%4,TotalCPU,ReqMem%6,MaxRSS%6
JobID JobName User State ExitCo Start Elapsed ReqC NCPU TotalCPU ReqMem MaxRSS
------------ ---------- --------- ---------- ------ ------------------- ---------- ---- ---- ---------- ------ ------
666 hello.sh steve COMPLETED 0:0 2025-09-18T23:43:12 00:00:30 1 1 00:00.014 1G
666.batch batch COMPLETED 0:0 2025-09-18T23:43:12 00:00:30 1 1 00:00.014 666K
Check active jobs in detail:
scontrol show job [<JOB_ID>]
scancel:
Signal or cancel jobs.
scancel <JOB_ID>
Submit a job
sbatch:
Submit a batch script to Slurm.
Create a batch script and run.
sbatch hello.sh
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
date -Iseconds
echo "$(hostname):$(pwd)"
echo "PATH: $PATH"
echo "Hello, Slurm!"
echo "=================================="
echo "SLURMD_NODENAME: $SLURMD_NODENAME"
echo "SLURM_SUBMIT_DIR: $SLURM_SUBMIT_DIR"
echo "SLURM_JOB_ID: $SLURM_JOB_ID"
echo "SLURM_JOB_NAME: $SLURM_JOB_NAME"
echo "SLURM_JOB_START_TIME: $SLURM_JOB_START_TIME"
echo "SLURM_JOB_END_TIME: $SLURM_JOB_END_TIME"
echo "SLURM_JOB_CPUS_PER_NODE: $SLURM_JOB_CPUS_PER_NODE"
echo "SLURM_CPUS_PER_TASK: $SLURM_CPUS_PER_TASK"
echo "SLURM_MEM_PER_CPU: $SLURM_MEM_PER_CPU"
echo "SLURM_MEM_PER_NODE: $SLURM_MEM_PER_NODE"
echo "=================================="
# do something
# just for demonstration; try `scontrol show job` etc.
sleep 60
date -Iseconds
You can get various information about the job with output environment variables in batch scripts.
Options
https://slurm.schedmd.com/archive/slurm-23.11.4/sbatch.html#SECTION_OPTIONS
Options can be set in three ways, and the priority is in the order below:
- command line options passed to
sbatch - input environment variables
#SBATCHdirectives in batch scripts (we recommend this for reproducibility)
-c,--cpus-per-task=<ncpus>- Request CPU cores per task.
--mem-per-cpu=<size>[units]- Request memory per CPU core.
The default of our system is
--mem-per-cpu=1G. See hardware page andsinfo -Nlcommand for available memory size. - Total memory for a job can be set with
--mem=<size>[units]. -J,--job-name=<jobname>- Set the job name. The name of the script file is used by default.
-D,--chdir=<directory>- Set the working directory of the job.
The default is
$SLURM_SUBMIT_DIRwheresbatchis called. -o,--output=<filename_pattern>- Set the file path to redirect standard outout, relative to
$SLURM_SUBMIT_DIR. The default isslurm-%j.outfor single jobs andslurm-%A_%a.outfor job arrays, where%j,%A, and%aare replaced by the job ID, array job ID, and the array index, respectively. See filename pattern. -e,--error=<filename_pattern>- By default both standard output and standard error are directed to the same file.
-a,--array=<indexes>- Submit a job array (see below).
job array
https://slurm.schedmd.com/archive/slurm-23.11.4/job_array.html
Use a job array to run the same script many times in parallel. This is perfect for processing lots of data files.
sbatch params.sh
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --array=1-3
date -Iseconds
echo "$(hostname):$(pwd)"
echo "PATH: $PATH"
echo "Hello, Slurm job array!"
echo "=================================="
echo "SLURMD_NODENAME: $SLURMD_NODENAME"
echo "SLURM_SUBMIT_DIR: $SLURM_SUBMIT_DIR"
echo "SLURM_JOB_ID: $SLURM_JOB_ID"
echo "SLURM_JOB_NAME: $SLURM_JOB_NAME"
echo "SLURM_JOB_START_TIME: $SLURM_JOB_START_TIME"
echo "SLURM_JOB_END_TIME: $SLURM_JOB_END_TIME"
echo "SLURM_JOB_CPUS_PER_NODE: $SLURM_JOB_CPUS_PER_NODE"
echo "SLURM_CPUS_PER_TASK: $SLURM_CPUS_PER_TASK"
echo "SLURM_MEM_PER_CPU: $SLURM_MEM_PER_CPU"
echo "SLURM_MEM_PER_NODE: $SLURM_MEM_PER_NODE"
echo "=================================="
echo "SLURM_ARRAY_JOB_ID: $SLURM_ARRAY_JOB_ID"
echo "SLURM_ARRAY_TASK_ID: $SLURM_ARRAY_TASK_ID"
echo "SLURM_ARRAY_TASK_COUNT: $SLURM_ARRAY_TASK_COUNT"
echo "=================================="
PARAM_ARRAY=(alpha beta gamma)
PARAM=${PARAM_ARRAY[$SLURM_ARRAY_TASK_ID]}
echo "PARAM: $PARAM"
# do something with "$PARAM"
PADDED_ID=$(printf "%03d" $SLURM_ARRAY_TASK_ID)
INPUT_FILE="data_${PADDED_ID}.txt"
echo "INPUT_FILE: $INPUT_FILE"
# do something with "$INPUT_FILE"
# just for demonstration; try `scontrol show job` etc.
sleep 60
date -Iseconds
interactive job
https://slurm.schedmd.com/srun.html
An interactive job allows you to run commands or programs directly on a compute node.
You can log in to the compute node using srun --pty $SHELL:
username@scorpion:~$ hostname
scorpion
username@scorpion:~$ srun --cpus-per-task 4 --mem-per-cpu 5G --pty $SHELL
username@scorpion02:~$ hostname
scorpion02
username@scorpion02:~$ # do sometihing
username@scorpion02:~$ exit
Alternatively, you can reserve resources in advance with salloc and then run commands with srun:
salloc --cpus-per-task 4 --mem-per-cpu 5G
srun # do something
exit
jupyter notebook
-
(Preparation) Add the following to your local machine’s
~/.ssh/configforProxyJump. You can easily connect to the compute node if an SSH key pair is properly set up.Host scorpion01 HostName scorpion01 ProxyJump scorpion.biology.tohoku.ac.jpDo the same for the other nodes.
-
Start Jupyter Notebook by interactive job:
nohup srun jupyter notebook --no-browser >jupyter.log 2>&1 </dev/null &Specify resources as needed.
-
Check the using node:
squeue # JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) # 13649 compute jupyter watamine R 0:14 1 scorpion02 -
Check the URL in
jupyter.log:less jupyter.log[C 2025-09-29 15:02:56.702 ServerApp] To access the server, open this file in a browser: file:///misc/home/watamine/.local/share/jupyter/runtime/jpserver-610054-open.html Or copy and paste one of these URLs: http://localhost:8889/tree?token=2acc5ae6d208ebfd29ee64d86bf03f6e4dcbe7f9ceec964f http://127.0.0.1:8889/tree?token=2acc5ae6d208ebfd29ee64d86bf03f6e4dcbe7f9ceec964fThe port 8889 is used this time. Port number may differ depending on the timing of the connection.
-
On your local machine, create an SSH tunnel to the compute node via the login node.
ssh -L 8889:localhost:8889 watamine@scorpion02Please change the port number and node name according to your connection.
-
Open the link shown in the
Jupyter.logon your web browser.http://localhost:8889/tree?token=xxxxxxxxxxxxxxxx
πγSince Jupyter is running in the background, it will continue to run even if your computer goes to sleep or your network connection is interrupted.
To terminate the session, please use the scancel command.
