PERUN Supercomputer – Partitions Overview¶

About PERUN Partitions

PERUN uses two main partitions: CPU and GPU.
Each job submitted to Slurm must specify one of these partitions, unless a default is used.

1. What Are Slurm Partitions?¶

A partition in Slurm represents a group of compute nodes with similar characteristics or usage rules.
Partitions define: - hardware constraints (CPU cores, GPUs, memory) - job limitations (max runtime, max cores, number of nodes) - resource availability and priority - access to specialized hardware (e.g., GPU nodes)

Tip — Always choose the correct partition

CPU workloads should run in the cpu_short or cpu_long partition.
GPU or AI workloads must run in the gpu_short or gpu_long partition.

2. Available PERUN Partitions¶

PERUN defines four partitions split by workload type and time limit:

Partition	Nodes	Time Limit	Max Job Size	GPUs	Purpose
cpu_short	cn01–cn22	2 days	Up to system limits	0	Short CPU HPC workloads
cpu_long	cn23–cn32	4 days	Up to system limits	0	Long-running CPU workloads
gpu_short	gpu01–gpu18 (H200)	2 days	Up to 8 GPUs per node	8 per node	Short AI/ML/GPU workloads
gpu_long	gpu19–gpu26 (H200)	4 days	Up to 8 GPUs per node	8 per node	Long AI/ML/GPU workloads

Example — Selecting a Partition

sbatch -p cpu_short job.sh

sbatch -p gpu_short gpu_job.sh

3. Viewing Partition Information¶

You can inspect partitions with:

Basic Slurm overview¶

sinfo

Detailed partition definitions¶

scontrol show partitions

Example sinfo Output Snippet

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST

cpu_short*    up 2-00:00:00     22   idle cn[01-22]
cpu_long      up 4-00:00:00     10   idle cn[23-32]
gpu_short     up 2-00:00:00     18   idle gpu[01-18]
gpu_long      up 4-00:00:00      8   idle gpu[19-26]
viz           up    8:00:00      2   idle viz[01-02]

Note

Node status values: - idle → ready to run jobs
- alloc → currently running jobs
- mix → partially allocated
- down/drain → node unavailable

4. Choosing the Right Partition¶

Use the cpu_short or cpu_long partition when:¶

running multi-core CPU jobs
performing scientific simulations
running general HPC workloads

Use cpu_short for jobs under 2 days, cpu_long for jobs up to 4 days.

Use the gpu_short or gpu_long partition when:¶

training machine learning / deep learning models
performing GPU-accelerated workloads (CUDA, PyTorch, TensorFlow)
requiring NVIDIA H200 performance

Use gpu_short for jobs under 2 days, gpu_long for jobs up to 4 days.

Warning — GPU misuse

Jobs without GPU requirements should not run on the GPU partitions.

5. Submitting Jobs to a Partition¶

CPU job example¶

#!/bin/bash
#SBATCH -p cpu_short
#SBATCH -n 32
#SBATCH -t 24:00:00
python simulation.py

GPU job example¶

#!/bin/bash
#SBATCH -p gpu_short
#SBATCH --gres=gpu:4
#SBATCH -t 48:00:00
python train.py

Important

Failing to specify --gres=gpu:<num> in the GPU partition will result in no GPUs being allocated.

6. Walltime and Efficiency¶

Why walltime matters¶

Jobs with too-high time limits wait longer in the queue.
Shorter jobs are often scheduled earlier.
Improper walltime estimates decrease cluster efficiency.

Efficient walltime use

If your job usually finishes in 3 hours, do not request 24 hours.
If your job runs under 2 days, prefer cpu_short or gpu_short over the long partitions.

7. Summary¶

PERUN provides four Slurm partitions: cpu_short, cpu_long, gpu_short, gpu_long.
Short partitions (2-day limit): cpu_short (cn01–cn22), gpu_short (gpu01–gpu18).
Long partitions (4-day limit): cpu_long (cn23–cn32), gpu_long (gpu19–gpu26).
Correct partition selection improves job scheduling and cluster efficiency.
Use sinfo and scontrol to inspect resources.
Always specify GPUs explicitly when using the gpu partitions.