Skip to content

PERUN Supercomputer – Partitions Overview

About PERUN Partitions

PERUN uses two main partitions: CPU and GPU.
Each job submitted to Slurm must specify one of these partitions, unless a default is used.


1. What Are Slurm Partitions?

A partition in Slurm represents a group of compute nodes with similar characteristics or usage rules.
Partitions define: - hardware constraints (CPU cores, GPUs, memory) - job limitations (max runtime, max cores, number of nodes) - resource availability and priority - access to specialized hardware (e.g., GPU nodes)

Tip — Always choose the correct partition

CPU workloads should run in the cpu_short or cpu_long partition.
GPU or AI workloads must run in the gpu_short or gpu_long partition.


2. Available PERUN Partitions

PERUN defines four partitions split by workload type and time limit:

Partition Nodes Time Limit Max Job Size GPUs Purpose
cpu_short cn01–cn22 2 days Up to system limits 0 Short CPU HPC workloads
cpu_long cn23–cn32 4 days Up to system limits 0 Long-running CPU workloads
gpu_short gpu01–gpu18 (H200) 2 days Up to 8 GPUs per node 8 per node Short AI/ML/GPU workloads
gpu_long gpu19–gpu26 (H200) 4 days Up to 8 GPUs per node 8 per node Long AI/ML/GPU workloads

Example — Selecting a Partition

sbatch -p cpu_short job.sh
sbatch -p gpu_short gpu_job.sh


3. Viewing Partition Information

You can inspect partitions with:

Basic Slurm overview

sinfo

Detailed partition definitions

scontrol show partitions

Example sinfo Output Snippet

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST

cpu_short*    up 2-00:00:00     22   idle cn[01-22]
cpu_long      up 4-00:00:00     10   idle cn[23-32]
gpu_short     up 2-00:00:00     18   idle gpu[01-18]
gpu_long      up 4-00:00:00      8   idle gpu[19-26]
viz           up    8:00:00      2   idle viz[01-02]

Note

Node status values: - idle → ready to run jobs
- alloc → currently running jobs
- mix → partially allocated
- down/drain → node unavailable


4. Choosing the Right Partition

Use the cpu_short or cpu_long partition when:

  • running multi-core CPU jobs
  • performing scientific simulations
  • running general HPC workloads

Use cpu_short for jobs under 2 days, cpu_long for jobs up to 4 days.

Use the gpu_short or gpu_long partition when:

  • training machine learning / deep learning models
  • performing GPU-accelerated workloads (CUDA, PyTorch, TensorFlow)
  • requiring NVIDIA H200 performance

Use gpu_short for jobs under 2 days, gpu_long for jobs up to 4 days.

Warning — GPU misuse

Jobs without GPU requirements should not run on the GPU partitions.


5. Submitting Jobs to a Partition

CPU job example

#!/bin/bash
#SBATCH -p cpu_short
#SBATCH -n 32
#SBATCH -t 24:00:00
python simulation.py

GPU job example

#!/bin/bash
#SBATCH -p gpu_short
#SBATCH --gres=gpu:4
#SBATCH -t 48:00:00
python train.py

Important

Failing to specify --gres=gpu:<num> in the GPU partition will result in no GPUs being allocated.


6. Walltime and Efficiency

Why walltime matters

  • Jobs with too-high time limits wait longer in the queue.
  • Shorter jobs are often scheduled earlier.
  • Improper walltime estimates decrease cluster efficiency.

Efficient walltime use

If your job usually finishes in 3 hours, do not request 24 hours.
If your job runs under 2 days, prefer cpu_short or gpu_short over the long partitions.


7. Summary

  • PERUN provides four Slurm partitions: cpu_short, cpu_long, gpu_short, gpu_long.
  • Short partitions (2-day limit): cpu_short (cn01–cn22), gpu_short (gpu01–gpu18).
  • Long partitions (4-day limit): cpu_long (cn23–cn32), gpu_long (gpu19–gpu26).
  • Correct partition selection improves job scheduling and cluster efficiency.
  • Use sinfo and scontrol to inspect resources.
  • Always specify GPUs explicitly when using the gpu partitions.