🎓 Welcome to AI-LAB Workshop 🎓

This workshop introduces you to the AI-LAB computing platform — a GPU-powered system for AI and deep learning.

Presenter - Frederik Petri Svenningsen Data Scientist, CLAAUDIA – research data services

Accessing AI-LAB

Before you can log in, you’ll need to complete this application form.

Workshop overview

You’ll learn to:

Log in to AI-LAB securely
Navigate the Linux environment
Run and monitor jobs using Slurm
Use containers for AI workloads
Manage resources efficiently

Next: What is AI-LAB →

What is AI-LAB?

AI-LAB is a GPU-powered mini-supercomputer that lets students run AI experiments, deep-learning projects, and simulations without needing their own advanced hardware.

💻 What You Can Do

AI-LAB allows you to:

Train deep learning models on GPU hardware
Run AI experiments and simulations
Collaborate with classmates or research groups
Access powerful resources without owning a GPU

🔧 Why Use AI-LAB?

Centralized system: no setup needed
Preinstalled environments (PyTorch, TensorFlow, etc.)
Fair resource sharing through Slurm
Remote access from anywhere

🧠 Example Use Cases

Deep learning projects for courses
Research prototypes
Data processing or simulation workloads

Next: AI-LAB Under the Hood →

AI-LAB under the hood

AI-LAB combines specialized hardware and software to deliver high-performance computing for AI workloads.

flowchart LR
  subgraph id1[<p style="font-family: Barlow, sans-serif; font-weight: 800; font-size: 12px; text-transform: uppercase; color: #221a52; letter-spacing: 1px; margin: 5px;">Compute nodes</p>]
  direction TB
  A["<span><img src="/assets/img/server.svg"  width='25' height='25' >ailab-l4-[01-11]</span>"]
  end

  subgraph id2[<p style="font-family: Barlow, sans-serif; font-weight: 800; font-size: 16px; text-transform: uppercase; color: #221a52; letter-spacing: 1px; margin: 10px;">AI-LAB</p>]
  direction TB
  subgraph id3[<p style="font-family: Barlow, sans-serif; font-weight: 800; font-size: 12px; text-transform: uppercase; color: #221a52; letter-spacing: 1px; margin: 5px;">Front-end nodes</p>]
    direction TB
    G["<span><img src="/assets/img/server.svg" width='25' height='25'>ailab-fe[01-02]</span>"]
    end
  id3 --> id1 

  subgraph id4[<p style="font-family: Barlow, sans-serif; font-weight: 800; font-size: 12px; text-transform: uppercase; color: #221a52; letter-spacing: 1px; margin: 5px;">File storage</p>]
    direction TB
    E["<span><img src="/assets/img/server.svg" width='25' height='25'>Ceph</span>"]
    end

  id1 & id3 <--> id4
  end

  F[<span><img src="/assets/img/person.svg" width='25' height='25'>User</span>]-- SSH --> id3

🖥️ Hardware Overview

Component	Description
Login Nodes	2 nodes for connecting and submitting jobs
Compute Nodes	11 powerful machines with GPUs
GPUs	NVIDIA L4 GPUs (8 per node, 24 GB memory each)
Storage	Central networked storage via Ceph

⚙️ Software Stack

Layer	Tool	Purpose
Scheduler	Slurm	Manages compute resources and queues
Containers	Singularity	Isolates applications and dependencies

Next: The AI-LAB Workflow →

The AI-LAB workflow

AI-LAB follows a simple 4-step workflow for running AI experiments efficiently.

🔄 Workflow Overview

Log in from your local computer
Upload your code and data
Run compute jobs on the GPUs using Slurm
View or download your results

Next: Logging into AI-LAB →

The AI-LAB workflow

AI-LAB follows a simple 4-step workflow for running AI experiments efficiently.

🔄 Workflow Overview

Log in from your local computer
Upload your code and data
Run compute jobs on the GPUs using Slurm
View or download your results

Next: Logging into AI-LAB →

Logging into AI-LAB

You connect to AI-LAB using SSH (Secure Shell).

There are two frontend nodes:

ailab-fe01.srv.aau.dk
ailab-fe02.srv.aau.dk

Use either when logging in.

🔐 Logging In

Open your terminal (Windows users should use PowerShell) and run:

ssh user@student.aau.dk@ailab-fe01.srv.aau.dk

Replace user@student.aau.dk with your actual AAU email address.

The first time you connect, type yes to trust the server fingerprint. Then enter your AAU password (no stars are shown while typing).

Having login issues? Check out the troubleshooting guide for solutions.

Next: File Handling on AI-LAB →

File handling on AI-LAB

All your files are stored in network-mounted directories shared across the system.

📂 Default User Directory

Your personal home directory:

/ceph/home/[domain]/[user]

👨‍👦‍👦 Shared Spaces

Path	Purpose
`/ceph/project`	Shared project folders
`/ceph/course`	Course-related materials
`/ceph/container`	Ready-to-use containers

Private project folders can be created among semestergroup members — follow this guide.

Next: Essential Linux Commands →

Essential Linux commands

AI-LAB runs on Linux — here are the basics you’ll need.

📁 Navigating Directories

pwd            # Show current directory
ls             # List files
cd foldername  # Change directory

📄 Managing Files

cp file1 file2     # Copy file
mv file1 folder/   # Move or rename file
rm file1           # Delete file
mkdir newfolder    # Create a folder
cat file.txt       # Display file contents

✏️ Editing Files

Use the micro editor:

micro myscript.sh

Save: Ctrl + S then Enter
Exit: Ctrl + Q

Next: Transferring Files →

Transferring files

Use scp (secure copy) to upload and download files between your local computer and AI-LAB.

📤 Uploading Files

scp -r myfile.txt user@student.aau.dk@ailab-fe01.srv.aau.dk:~

📥 Downloading Files

scp -r user@student.aau.dk@ailab-fe01.srv.aau.dk:~/myfile.txt .

-r copies directories recursively
~ means your home directory on AI-LAB

💻 File managers (recommended)

For Windows users, we recommend WinSCP.

For Linux, macOS, or Windows (cross-platform), we recommend Double Commander.

Next: Slurm →

Slurm

Slurm is the job scheduler that manages compute resources on AI-LAB.

🧠 What Slurm Does

Allocates CPUs, GPUs, and memory to jobs
Queues jobs when resources are busy
Ensures fairness among users

🔍 Useful Commands

squeue          # View all jobs
squeue --me     # View your jobs
sinfo           # Show node status
nodesummary     # Display resource allocations

Next: Two Ways of Running Jobs →

Two ways of running jobs

You can run compute tasks in two main ways on AI-LAB: interactive (srun) or batch (sbatch).

1️⃣ Interactive Job – srun

Runs immediately in your terminal session.

srun -u echo "Hello from compute node"

Use for quick tests or debugging.

-u forces srun to print outputs immediately

2️⃣ Batch Job – sbatch

Submit a script to run in the background.

run.sh

#!/bin/bash
echo "Hello from compute node"

Submit it:

sbatch run.sh

Next: Exercise 1 →

Exercise 1: Run a simple job with srun

Download workshop files by running this command:
```
ailab --workshop
```
Change directory (cd) to workshop
Hint
```
cd ~/workshop
```
Run the script simple_script.py with python3 using srun -u
Hint
```
srun -u python3 simple_script.py
```
...and you should get:

...
Second 29...
Second 30...
Done after 30 seconds!

Next: Creating an sbatch Script →

Creating an sbatch script

Batch scripts tell Slurm what to run and which resources to use.

✏️ Creating a script

Create your script using micro or your preferred editor:

micro run.sh

run.sh

#!/bin/bash

#SBATCH --job-name=myjob       
#SBATCH --time=0:10:00 
#SBATCH --output=myjob.log

echo "Hello from compute node"
sleep 60
echo "Done sleeping"

Save and exit (Ctrl+S, Enter, Ctrl+Q).

🚀 Submit your script

Submit the batch script to Slurm:

sbatch run.sh

This command sends your script to the Slurm scheduler, which will run it when resources become available.

📄 Check the output

Once your job completes, check the output file:

cat myjob.log

Next: Exercise 2 →

Exercise 2: Create and submit a batch script

Practice creating and submitting batch scripts.

Use micro text editor (or any other if you're an experienced Linux user) to open the script run.sh that already exist in the workshop directory.
Hint
```
micro run.sh
```
In the bottom of the script, add:
```
python3 simple_script.py
```
Save it by hitting CTRL + S and then CTRL + Q to exit nano.
Submit the job using sbatch
Hint
```
sbatch run.sh
```
Check the job status using squeue --me
Hint: Make it update every second
```
watch -n1 squeue --me
```
Once completed, check the results by printing out the output file using cat command
Hint
```
cat myjob.log
```

Next: Allocating Resources →

--8<-- "ai-lab/workshop/14-all# Allocating resources

When running jobs, you can request specific resources like memory, CPUs, and GPUs.

💾 System memory

--mem=40G

⚙️ CPUs

--cpus-per-task=15

🎮 GPUs

--gres=gpu:1

GPU Resource Limits

To ensure fair access for all users, AI-LAB enforces two important limits:

Maximum 4 GPUs per job: A single job can request no more than 4 GPUs (e.g., --gres=gpu:4)
Maximum 8 GPUs per user: Each user can run jobs using a total of up to 8 GPUs simultaneously across all their running jobs

We strongly encourage inexperienced users to allocate only 1 GPU, as most workloads do not speed up automatically with more GPUs.

🚀 Example: Allocating resources with srun

srun --cpus-per-task=4 --mem=8G --gres=gpu:1 python3 my_script.py

📝 Example: Allocating resources with sbatch

In a batch script, add resource requests using #SBATCH directives:

run.sh

#!/bin/bash
#SBATCH --gres=gpu:1         # Request 1 GPU
#SBATCH --cpus-per-task=4    # Request 4 CPUs
#SBATCH --mem=8G             # Request 8 GB memory

python3 my_script.py

Next: Containers → ocating-resources.md"

Containers on AI-LAB

AI-LAB uses Singularity containers to run applications safely and reproducibly.

📦 What are containers?

Containers bundle:

Application code
Libraries and dependencies
Configuration files

They ensure your code runs the same everywhere.

📢 Why containers?

Designed for HPC environments
Runs without admin privileges
Uses .sif container files

Next: Getting Containers →

Getting containers on AI-LAB

You can use preinstalled containers, download them, or create your own.

🏗️ Pre-downloaded containers

Stored in:

/ceph/container

List available containers:

ailab --list-containers

🌐 Download from NGC or Docker Hub

Follow this AI-LAB guide for downloading containers from:

NVIDIA NGC
Docker Hub

🧱 Build your own

Create a .def file and build your container using Singularity. See AI-LAB’s guide for how to create your own container.

Next: Using Containers →

Using containers on AI-LAB

Let's run a simple Python script inside a Singularity container with GPU support.

🚀 Example: Running a container with srun

srun singularity exec --nv /ceph/container/pytorch/pytorch_25.04.sif python3 gpu_stress.py

📝 Example: Running a container with sbatch

In a batch script, add resource requests using #SBATCH directives:

run.sh

#!/bin/bash

singularity exec --nv /ceph/container/pytorch/pytorch_25.04.sif python3 gpu_stress.py

📖 Understanding the Singularity command

Let's break down what each part does:

singularity exec: Tells Singularity to execute something inside the container.
--nv: Tells Singularity to include NVIDIA libraries. Always use this flag when running GPU-accelerated code so your container can access the GPU.
/ceph/container/pytorch/pytorch_25.04.sif: The path to your container file. This is a pre-downloaded PyTorch container stored on AI-LAB.
python3 gpu_stress.py: The command to run inside the container. This executes your Python script using Python 3 from within the container environment.

Next: Exercise 3 →

Exercise 3: Running a GPU script with containers and resources

Let's try running a Python GPU script inside a PyTorch container with resources allocated.

Inside the workshop directory, you will also find a file called run_container.sh
Check the file content using cat run_container.sh
Submit the job using sbatch
Hint
```
sbatch run_container.sh
```

Check the job status using squeue --me and find the JOBID.

Hint: How to find the JOBID

Here, 162841 is the JOBID

     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    162841        l4    myjob ry90cd@i  R      10:25      1 ailab-l4-11

Check the GPU utilization be running the following command:

ailab --gpu-util 162841

Replace 162841 with your JOBID

Hint: Understanding GPU Metrics

Key metrics to watch:

GPU-Util: Percentage of GPU being used (aim for 70-100% during training) Memory-Usage: How much GPU memory your job is using Temperature: GPU temperature (should stay below 80°C) Power: Power consumption (indicates workload intensity)

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:01:00.0 Off |                    0 |
| N/A   44C    P0             36W /   72W |     245MiB /  23034MiB |     90%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L4                      Off |   00000000:02:00.0 Off |                    0 |
| N/A   38C    P8             16W /   72W |       4MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L4                      Off |   00000000:41:00.0 Off |                    0 |
| N/A   41C    P8             16W /   72W |       1MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
...

+------------------------------------------------------------------------------+
|  GPU    PID     USER    GPU MEM  %CPU  %MEM      TIME  COMMAND               |
|    0 232843   user@+     236MiB   100   0.1  01:00:20  /usr/bin/python3 tor  |
+------------------------------------------------------------------------------+

The most important parameter to notice here is the GPU-Util metric. Here, you can see that the first GPU is operating at 90% GPU utilization. This indicates excellent utilization of the GPU.

You can locate which GPU(s) that belongs to your job, by finding your username below USER and the GPU number under GPU. In this case user@+ are utilizing GPU number 0 in the NVIDIA-SMI list.

+------------------------------------------------------------------------------+
|  GPU    PID     USER    GPU MEM  %CPU  %MEM      TIME  COMMAND               |
|    0 232843   user@+     236MiB   100   0.1  01:00:20  /usr/bin/python3 tor  |
+------------------------------------------------------------------------------+

Once completed, cancel all your jobs by using scancel -u $USER

Next: Final pointers →

Final pointers

Congratulations — you’ve reached the end of the AI-LAB workshop! 🎉

✅ Key reminders

Do not store confidential or sensitive data (type 2 or 3)
Jobs must not exceed 12 hours
Read the Fair Usage Policy
Access resets each August 1st
Expect 4 annual maintenance windows

🆘 Need help?

Visit the AAU Service Portal: https://serviceportal.aau.dk

🚀 Coming soon

VS Code integration on compute nodes
Web-based AI-LAB interface

🎓 Thank you for participating!

1 / 1

🎓 Welcome to AI-LAB Workshop 🎓

Accessing AI-LAB

Workshop overview

What is AI-LAB?

AI-LAB is a GPU-powered mini-supercomputer that lets students run AI experiments, deep-learning projects, and simulations without needing their own advanced hardware.

💻 What You Can Do

🔧 Why Use AI-LAB?

🧠 Example Use Cases

AI-LAB under the hood

🖥️ Hardware Overview

⚙️ Software Stack

The AI-LAB workflow

🔄 Workflow Overview

The AI-LAB workflow

🔄 Workflow Overview

Logging into AI-LAB

💻 Login Nodes

🔐 Logging In

File handling on AI-LAB

📂 Default User Directory

👨‍👦‍👦 Shared Spaces

Essential Linux commands

📁 Navigating Directories

📄 Managing Files

✏️ Editing Files

Transferring files

📤 Uploading Files

📥 Downloading Files

💻 File managers (recommended)

Slurm

🧠 What Slurm Does

🔍 Useful Commands

Two ways of running jobs

1️⃣ Interactive Job – srun

2️⃣ Batch Job – sbatch

Exercise 1: Run a simple job with srun

Creating an sbatch script

✏️ Creating a script

🚀 Submit your script

📄 Check the output

Exercise 2: Create and submit a batch script

💾 System memory

⚙️ CPUs

🎮 GPUs

🚀 Example: Allocating resources with srun

📝 Example: Allocating resources with sbatch

Containers on AI-LAB

📦 What are containers?

📢 Why containers?

Getting containers on AI-LAB

🏗️ Pre-downloaded containers

🌐 Download from NGC or Docker Hub

🧱 Build your own

Using containers on AI-LAB

🚀 Example: Running a container with srun

📝 Example: Running a container with sbatch

📖 Understanding the Singularity command

Exercise 3: Running a GPU script with containers and resources

Final pointers

✅ Key reminders

🆘 Need help?

🚀 Coming soon