Welcome to CNQO!
Here are some notes for how to use the CNQO lab computers and servers.
Accessing the System
Let a member of Physics IT Support know your username, and they will get you onto the system.
Ways to log in
Once you are set up, you can login using:
- SSH terminal (for example, PuTTY)
- ThinLinc remote desktop app
- The PCs in the CNQO labs (JA7.13)
Password-less Access
Once logged in, you can ssh between servers without a password, provided you use the servername-s name. For example:
ssh phys-vole-s
which uses the CNQO private network between servers.
Servers configured with host-based authentication are:
- wildebeest-s
- phys-vole-s
- ribbo1-17
- hippo2
- phys-porpoise-s
CNQO Servers and what they do
wildebeest.phys.strath.ac.uk is the main access server, but don't run anything serious on here. Instead SSH from here to one of the other servers below to perform calculations:
Wildebeest
- Login server for off-campus access
Phys-vole
- Server for submitting jobs to SLURM queue
- Login access for on-campus users
CNQO_intel queue
Node | Cores | Memory |
---|---|---|
Ribbo1 | 24 cores | 96GB RAM |
Ribbo2 | 24 cores | 96GB RAM |
Ribbo3 | 24 cores | 128GB RAM |
Ribbo8 | 40 cores | 128GB RAM |
Ribbo9 | 40 cores | 128GB RAM |
Ribbo10 | 16 cores | 192GB RAM |
Ribbo14 | 20 cores | 64GB RAM |
Ribbo15 | 20 cores | 64GB RAM |
Ribbo16 | 20 cores | 64GB RAM |
Ribbo17 | 20 cores | 64GB RAM |
- Compute nodes in the cnqo_intel queue
- Intel chips
SMTS_intel queue
- Compute nodes Ribbo4-7
- Intel chips, 40 cores on each node
- 192GB RAM
Marine_intel queue
- Compute nodes Ribbo11-13
- Intel chips, 40 cores on each node
- 256GB or 384GB RAM
Hippo2
- Compute node
- AMD chips, 48 cores
- 128GB RAM
Phys-porpoise
- Server for standalone non-SLURM work
- JupyterHub server - please ask if you want access
Olinguito
- Workstation with dedicated graphics card, for visualisation (JA7.18)
Armadillo
- File server, with 22TB of space for user home folders
- Application server
Phys-tapir
- Ubuntu virtual server for users to mount I drive in order to move files off the cluster
Phys-mongoose
- Ansible configuration management
Applications
The shared /opt/local folder hosts applications:
- Matlab R2018a
- Mathematica 13.3 and Wolfram 14.2
- Intel oneAPI
- IDL
- GCC 5.5.0/7.3.0/8.4.0/11.3.0/14.2.0
- OpenMPI
- Scalapack
- HDF
- FFTW
- Puffin
- Miniconda for Python3
To use the applications load the module:
module avail
- shows available modulesmodule load <**>
- load the module to add it to the pathmodule list
- shows loaded modules
SLURM job scripts
To use, log into vole as yourself.
- ribbo1-3 - three compute nodes, 24 cores per node, ~90GB per node
- ribbo4-7 - four compute nodes, 40 cores per node, 192GB per node
- ribbo8-9 - two compute nodes, 40 cores per node, 128GB per node
- ribbo10 - one compute node, 16 cores, 192GB
- ribbo11-13 - three compute nodes, 40 cores per node, 256+GB per node
- ribbo14-17 - four compute nodes, 20 cores per node, 64GB per node
- hippo2 - one compute node, 48 cores, 128GB
squeue
shows the progress PD for pending, and R for running
sinfo
for how the queue is looking
To use the queue, you must ask for your username to be added to the queue user list.
To submit jobs to the queue, make a script. For example:
#!/bin/bash
#
#SBATCH --job-name=CNQO_test_1
#SBATCH --output=CNQO_test_1-%j.txt
#
#SBATCH --mail-type=ALL
#SBATCH --mail-user=user_email_address
#
#SBATCH --ntasks=1
# Double ## is for comments
##SBATCH --time=10:00
##SBATCH --mem-per-cpu=10
srun hostname
srun sleep 20
Where srun
calls your program from the compute node, it's probably best to give to the whole path to the home folder.
To submit the job:
sbatch <name of script>
I've put an environment module on the system with GCC 5.5.0, so you can try loading that as well:
#!/bin/bash
#
#SBATCH --job-name=CNQO_test2
#SBATCH --output=CNQO_test_2-%j.txt
#
#SBATCH --ntasks=1
##SBATCH --time=10:00
##SBATCH --mem-per-cpu=10
module load compilers/gcc/5.5.0
srun gfortran --version
date
- User can have up to 160 running jobs at a time - more than that will be queued until those have finished
- Currently there is no limit on job time, but if you specify a job time in your script and don't finish before it, the job is cancelled
- Limit on memory - each of the CNQO compute nodes is slightly different in the memory available:
- ribbo1 - 24 cores, 4000MB/core
- ribbo2 - 24 cores, 3916MB/core
- ribbo3 - 24 cores, 5333MB/core
- ribbo4, 5, 6 and 7 - 40 cores, 4800MB/core
- ribbo8 and 9 - 40 cores, 3200MB/core
- ribbo10 - 16 cores, 11600MB/core
- ribbo11 and 12 - 40 cores, 9600MB/core
- ribbo13 - 40 cores, 6400MB/core
- ribbo14, 15, 16 and 17 - 20 cores, 3200MB/core
- hippo2 - 48 cores, 2666MB/core
The default MemPerCPU is set to 2666MB, the lowest per core amount, while the MaxPerCPU as 5333MB. If you want more than 2666MB per core, you need to define it in your batch script (e.g. #SBATCH - mem-per-cpu=4000
) up to the Maximum allowed. If memory usage is exceeded, the script will stop.
Parallel jobs with Openmpi
#!/bin/bash
##
# Propagate environment variables to the compute node
#SBATCH --export=ALL
# Run in the cnqo_intel partition (queue)
#SBATCH --partition=cnqo_intel
# Distribute processes in round-robin fashion for load balancing
#SBATCH --distribution=cyclic
# No. of tasks required
#SBATCH --ntasks=48
#SBATCH --cpus-per-task=1
#SBATCH --nodes=1
# Specify (hard) runtime (HH:MM:SS)
#SBATCH --time=00:10:00
# Job name
#SBATCH --job-name=CNQO_mpi_test_1
# Output file
#SBATCH --output=CNQO_mpi_test_1-%j.out
pwd; hostname; date
echo "Running MPI test program on $SLURM_JOB_NUM_NODES nodes with $SLURM_NTASKS tasks, each with $SLURM_CPUS_PER_TASK cores."
module load libs/gcc/5.5.0/openmpi/3.1.0
mpirun -np $SLURM_NTASKS /home/users/username/mpi_hello
date
This should run 48 tasks on 1 node of the mpi_hello
application.
Options for your scripts include:
- Use the
--exclusive
flag if you require a whole node for your job - Specify the nodes you want to use with
--nodelist=server1,server2
Next are three different ways for running multiple serial jobs.
JobFarm
A job farm runs a number of identical jobs on a node that take roughly the same time.
#!/bin/sh
#################
#
# Use this script in SLURM by running 'sbatch cnqo_doc_slurm3.sh'
# Access the details of the running serial jobs by using 'squeue -j SLURM_JOB_ID -s'
#
#################
# Requesting the number of nodes needed, and asking for exclusive access to those nodes
#SBATCH -N 1
#SBATCH --tasks-per-node=24
#SBATCH --exclusive
#
# Job time, change for what your job farm requires - here it's 5 minutes
#SBATCH -t 00:05:00
#
# Job name and output file names
#SBATCH -J cnqo_test_jobFarm
#SBATCH -o CNQO_test_jobFarm-%j.out
#SBATCH -e CNQO_test_jobFarm-%j.out
# Set the number of jobs
export number_of_jobs=$SLURM_NTASKS
# Loop over the serial job number
for ((i=0; i<$number_of_jobs; i++))
do
# Run the script quietly, exclusively, for one core on one node, passing in the serial job number, and setting the output to a file with the SLURM job number and the serial job number
srun -Q --exclusive -n 1 -N 1 \
cnqo_test_jobFarm_task $i &> worker_${SLURM_JOB_ID}_${i} &
sleep 1
done
# Keep the wait statement, it is important!
wait
where cnqo_test_jobFarm_task
is
#!/bin/sh
# this script echoes some useful output so we can see what parallel
# and srun are doing
sleepsecs=$[($RANDOM % 10) + 40]s
# We output the sleep time, hostname, and date for more info
echo sleep:$sleepsecs host:$(hostname) date:$(date)
# sleep a random amount of time
sleep $sleepsecs
To use three nodes, make the following changes:
#################
# Requesting the number of nodes needed, and asking for exclusive access to those nodes
#SBATCH -N 3
#SBATCH --tasks-per-node=24
#SBATCH --exclusive
#
...
#
# Set the number of jobs
export number_of_jobs=72
Array
Whereas the job farm runs as one job in the scheduler, array jobs are run separately in the queue, with an array number to identify them. The array id, $SLURM_ARRAY_TASK_ID
, means you can identify a different input file or parameters for each array job.
#!/bin/sh
##
# Job time, change for what your job farm requires - here it's 5 minutes
#SBATCH -t 00:05:00
#
# Job name and output file names
#SBATCH -J cnqo_test_arrayJob
#SBATCH -o CNQO_test_arrayJob_%A_%a.out
#SBATCH -e CNQO_test_arrayJob_%A_%a.out
# Use the % separator to limit the concurrent jobs to 8 for an array of 30 jobs
#SBATCH --array=1-30%8
sleepsecs=$[($RANDOM % 10) + 10]s
# We output the sleep time, hostname, and date for more info>
echo sleep:$sleepsecs host:$(hostname) date:$(date)
# More usefully, run something like
#./myprogram < input_file_$SLURM_ARRAY_TASK_ID
# sleep a random amount of time
sleep $sleepsecs
By adding
# Requesting the number of nodes needed, and asking for exclusive access to those nodes
#SBATCH -N 1
#SBATCH --exclusive
#SBATCH --tasks-per-node=24
you can reserve a node for the one array job, allowing multi-threaded computation.
Multiprog
The --multi-prog
option lets you assign each task in your job a different option, as specified in a conf file.
#!/bin/sh
#SBATCH -n 72
#SBATCH -t 00:10:00
#SBATCH -J cnqo_test_multi_prog
srun -l --multi-prog cnqo_multiprog.conf
where cnqo_multiprog.conf
looks like:
0-10 hostname
11,12 echo task:%t
13 echo task:%t-%o
14 echo task:%o
15-71 hostname