🎓 Welcome to AI-LAB Workshop 🎓
This workshop introduces you to the AI-LAB computing platform — a GPU-powered system for AI and deep learning.
Presenter - Frederik Petri Svenningsen Data Scientist, CLAAUDIA – research data services
Accessing AI-LAB
Before you can log in, you’ll need to complete this application form.
Workshop overview
You’ll learn to:
- Log in to AI-LAB securely
- Navigate the Linux environment
- Run and monitor jobs using Slurm
- Use containers for AI workloads
- Manage resources efficiently
Next: What is AI-LAB →
What is AI-LAB?
AI-LAB is a GPU-powered mini-supercomputer that lets students run AI experiments, deep-learning projects, and simulations without needing their own advanced hardware.
💻 What You Can Do
AI-LAB allows you to:
- Train deep learning models on GPU hardware
- Run AI experiments and simulations
- Collaborate with classmates or research groups
- Access powerful resources without owning a GPU
🔧 Why Use AI-LAB?
- Centralized system: no setup needed
- Preinstalled environments (PyTorch, TensorFlow, etc.)
- Fair resource sharing through Slurm
- Remote access from anywhere
🧠 Example Use Cases
- Deep learning projects for courses
- Research prototypes
- Data processing or simulation workloads
Next: AI-LAB Under the Hood →
AI-LAB under the hood
AI-LAB combines specialized hardware and software to deliver high-performance computing for AI workloads.
flowchart LR
subgraph id1[<p style="font-family: Barlow, sans-serif; font-weight: 800; font-size: 12px; text-transform: uppercase; color: #221a52; letter-spacing: 1px; margin: 5px;">Compute nodes</p>]
direction TB
A["<span><img src="/assets/img/server.svg" width='25' height='25' >ailab-l4-[01-11]</span>"]
end
subgraph id2[<p style="font-family: Barlow, sans-serif; font-weight: 800; font-size: 16px; text-transform: uppercase; color: #221a52; letter-spacing: 1px; margin: 10px;">AI-LAB</p>]
direction TB
subgraph id3[<p style="font-family: Barlow, sans-serif; font-weight: 800; font-size: 12px; text-transform: uppercase; color: #221a52; letter-spacing: 1px; margin: 5px;">Front-end nodes</p>]
direction TB
G["<span><img src="/assets/img/server.svg" width='25' height='25'>ailab-fe[01-02]</span>"]
end
id3 --> id1
subgraph id4[<p style="font-family: Barlow, sans-serif; font-weight: 800; font-size: 12px; text-transform: uppercase; color: #221a52; letter-spacing: 1px; margin: 5px;">File storage</p>]
direction TB
E["<span><img src="/assets/img/server.svg" width='25' height='25'>Ceph</span>"]
end
id1 & id3 <--> id4
end
F[<span><img src="/assets/img/person.svg" width='25' height='25'>User</span>]-- SSH --> id3
🖥️ Hardware Overview
| Component | Description |
|---|---|
| Login Nodes | 2 nodes for connecting and submitting jobs |
| Compute Nodes | 11 powerful machines with GPUs |
| GPUs | NVIDIA L4 GPUs (8 per node, 24 GB memory each) |
| Storage | Central networked storage via Ceph |

⚙️ Software Stack
| Layer | Tool | Purpose |
|---|---|---|
| Scheduler | Slurm | Manages compute resources and queues |
| Containers | Singularity | Isolates applications and dependencies |
Next: The AI-LAB Workflow →
The AI-LAB workflow
AI-LAB follows a simple 4-step workflow for running AI experiments efficiently.
🔄 Workflow Overview
- Log in from your local computer
- Upload your code and data
- Run compute jobs on the GPUs using Slurm
- View or download your results

Next: Logging into AI-LAB →
The AI-LAB workflow
AI-LAB follows a simple 4-step workflow for running AI experiments efficiently.
🔄 Workflow Overview
- Log in from your local computer
- Upload your code and data
- Run compute jobs on the GPUs using Slurm
- View or download your results

Next: Logging into AI-LAB →
Logging into AI-LAB
You connect to AI-LAB using SSH (Secure Shell).
💻 Login Nodes
There are two frontend nodes:
ailab-fe01.srv.aau.dkailab-fe02.srv.aau.dk
Use either when logging in.
🔐 Logging In
Open your terminal (Windows users should use PowerShell) and run:
Replace user@student.aau.dk with your actual AAU email address.
The first time you connect, type yes to trust the server fingerprint.
Then enter your AAU password (no stars are shown while typing).
Having login issues? Check out the troubleshooting guide for solutions.
Next: File Handling on AI-LAB →
File handling on AI-LAB
All your files are stored in network-mounted directories shared across the system.
📂 Default User Directory
Your personal home directory:
👨👦👦 Shared Spaces
| Path | Purpose |
|---|---|
/ceph/project |
Shared project folders |
/ceph/course |
Course-related materials |
/ceph/container |
Ready-to-use containers |
Private project folders can be created among semestergroup members — follow this guide.
Next: Essential Linux Commands →
Essential Linux commands
AI-LAB runs on Linux — here are the basics you’ll need.
📁 Navigating Directories
📄 Managing Files
cp file1 file2 # Copy file
mv file1 folder/ # Move or rename file
rm file1 # Delete file
mkdir newfolder # Create a folder
cat file.txt # Display file contents
✏️ Editing Files
Use the micro editor:
- Save:
Ctrl + Sthen Enter - Exit:
Ctrl + Q
Next: Transferring Files →
Transferring files
Use scp (secure copy) to upload and download files between your local computer and AI-LAB.
📤 Uploading Files
📥 Downloading Files
-rcopies directories recursively~means your home directory on AI-LAB
💻 File managers (recommended)
For Windows users, we recommend WinSCP.
For Linux, macOS, or Windows (cross-platform), we recommend Double Commander.
Next: Slurm →
Slurm
Slurm is the job scheduler that manages compute resources on AI-LAB.
🧠 What Slurm Does
- Allocates CPUs, GPUs, and memory to jobs
- Queues jobs when resources are busy
- Ensures fairness among users
🔍 Useful Commands
squeue # View all jobs
squeue --me # View your jobs
sinfo # Show node status
nodesummary # Display resource allocations
Next: Two Ways of Running Jobs →
Two ways of running jobs
You can run compute tasks in two main ways on AI-LAB: interactive (srun) or batch (sbatch).
1️⃣ Interactive Job – srun
Runs immediately in your terminal session.
Use for quick tests or debugging.
-u forces srun to print outputs immediately
2️⃣ Batch Job – sbatch
Submit a script to run in the background.
Submit it:
Next: Exercise 1 →
Exercise 1: Run a simple job with srun
-
Download workshop files by running this command:
-
Change directory (
cd) toworkshop -
Run the script
simple_script.pywithpython3usingsrun -u -
...and you should get:
Next: Creating an sbatch Script →
Creating an sbatch script
Batch scripts tell Slurm what to run and which resources to use.
✏️ Creating a script
Create your script using micro or your preferred editor:
#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --time=0:10:00
#SBATCH --output=myjob.log
echo "Hello from compute node"
sleep 60
echo "Done sleeping"
Save and exit (Ctrl+S, Enter, Ctrl+Q).
🚀 Submit your script
Submit the batch script to Slurm:
This command sends your script to the Slurm scheduler, which will run it when resources become available.
📄 Check the output
Once your job completes, check the output file:
Next: Exercise 2 →
Exercise 2: Create and submit a batch script
Practice creating and submitting batch scripts.
-
Use
microtext editor (or any other if you're an experienced Linux user) to open the scriptrun.shthat already exist in the workshop directory. -
In the bottom of the script, add:
-
Save it by hitting
CTRL + Sand thenCTRL + Qto exit nano. -
Submit the job using
sbatch -
Check the job status using
squeue --me -
Once completed, check the results by printing out the output file using
catcommand
Next: Allocating Resources →
--8<-- "ai-lab/workshop/14-all# Allocating resources
When running jobs, you can request specific resources like memory, CPUs, and GPUs.
💾 System memory
⚙️ CPUs
🎮 GPUs
GPU Resource Limits
To ensure fair access for all users, AI-LAB enforces two important limits:
- Maximum 4 GPUs per job: A single job can request no more than 4 GPUs (e.g.,
--gres=gpu:4) - Maximum 8 GPUs per user: Each user can run jobs using a total of up to 8 GPUs simultaneously across all their running jobs
We strongly encourage inexperienced users to allocate only 1 GPU, as most workloads do not speed up automatically with more GPUs.
🚀 Example: Allocating resources with srun
📝 Example: Allocating resources with sbatch
In a batch script, add resource requests using #SBATCH directives:
#!/bin/bash
#SBATCH --gres=gpu:1 # Request 1 GPU
#SBATCH --cpus-per-task=4 # Request 4 CPUs
#SBATCH --mem=8G # Request 8 GB memory
python3 my_script.py
Next: Containers → ocating-resources.md"
Containers on AI-LAB
AI-LAB uses Singularity containers to run applications safely and reproducibly.
📦 What are containers?
Containers bundle:
- Application code
- Libraries and dependencies
- Configuration files
They ensure your code runs the same everywhere.
📢 Why containers?
- Designed for HPC environments
- Runs without admin privileges
- Uses
.sifcontainer files
Next: Getting Containers →
Getting containers on AI-LAB
You can use preinstalled containers, download them, or create your own.
🏗️ Pre-downloaded containers
Stored in:
List available containers:
🌐 Download from NGC or Docker Hub
Follow this AI-LAB guide for downloading containers from:
- NVIDIA NGC
- Docker Hub
🧱 Build your own
Create a .def file and build your container using Singularity.
See AI-LAB’s guide for how to create your own container.
Next: Using Containers →
Using containers on AI-LAB
Let's run a simple Python script inside a Singularity container with GPU support.
🚀 Example: Running a container with srun
📝 Example: Running a container with sbatch
In a batch script, add resource requests using #SBATCH directives:
#!/bin/bash
singularity exec --nv /ceph/container/pytorch/pytorch_25.04.sif python3 gpu_stress.py
📖 Understanding the Singularity command
Let's break down what each part does:
singularity exec: Tells Singularity to execute something inside the container.--nv: Tells Singularity to include NVIDIA libraries. Always use this flag when running GPU-accelerated code so your container can access the GPU./ceph/container/pytorch/pytorch_25.04.sif: The path to your container file. This is a pre-downloaded PyTorch container stored on AI-LAB.python3 gpu_stress.py: The command to run inside the container. This executes your Python script using Python 3 from within the container environment.
Next: Exercise 3 →
Exercise 3: Running a GPU script with containers and resources
Let's try running a Python GPU script inside a PyTorch container with resources allocated.
-
Inside the workshop directory, you will also find a file called
run_container.sh -
Check the file content using
cat run_container.sh -
Submit the job using
sbatch -
Check the job status using
squeue --meand find the JOBID. -
Check the GPU utilization be running the following command:
Replace
162841with your JOBIDHint: Understanding GPU Metrics
Key metrics to watch:
GPU-Util: Percentage of GPU being used (aim for 70-100% during training) Memory-Usage: How much GPU memory your job is using Temperature: GPU temperature (should stay below 80°C) Power: Power consumption (indicates workload intensity)
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA L4 Off | 00000000:01:00.0 Off | 0 | | N/A 44C P0 36W / 72W | 245MiB / 23034MiB | 90% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA L4 Off | 00000000:02:00.0 Off | 0 | | N/A 38C P8 16W / 72W | 4MiB / 23034MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA L4 Off | 00000000:41:00.0 Off | 0 | | N/A 41C P8 16W / 72W | 1MiB / 23034MiB | 0% Default | | | | N/A | ... +------------------------------------------------------------------------------+ | GPU PID USER GPU MEM %CPU %MEM TIME COMMAND | | 0 232843 user@+ 236MiB 100 0.1 01:00:20 /usr/bin/python3 tor | +------------------------------------------------------------------------------+The most important parameter to notice here is the GPU-Util metric. Here, you can see that the first GPU is operating at 90% GPU utilization. This indicates excellent utilization of the GPU.
You can locate which GPU(s) that belongs to your job, by finding your username below USER and the GPU number under GPU. In this case user@+ are utilizing GPU number 0 in the NVIDIA-SMI list.
-
Once completed, cancel all your jobs by using
scancel -u $USER
Next: Final pointers →
Final pointers
Congratulations — you’ve reached the end of the AI-LAB workshop! 🎉
✅ Key reminders
- Do not store confidential or sensitive data (type 2 or 3)
- Jobs must not exceed 12 hours
- Read the Fair Usage Policy
- Access resets each August 1st
- Expect 4 annual maintenance windows
🆘 Need help?
Visit the AAU Service Portal: https://serviceportal.aau.dk
🚀 Coming soon
- VS Code integration on compute nodes
- Web-based AI-LAB interface
🎓 Thank you for participating!