Skip to main content

Cluster Overview

This document provides a high-level overview of the HPC cluster, including hardware specifications, network architecture, and the services running on each node.


Network Architecture

The cluster consists of two workstations and a NAS, interconnected via a 10GbE internal network. All three devices connect through a TP-Link TL-SX105 5-port 10GbaseT unmanaged switch. The switch uplinks to the company network via a 1Gb wall port.

External Network

│ 1Gb uplink

[Wall Port]

[TP-Link 5-Port Unmanaged Switch]
│ │ │
│ │ │ 10Gb Internal Network
│ │ │
[Control Node] [Compute Node] [NAS Storage Array]
DeviceHostnameIP AddressNetwork
Control Nodenode01192.168.220.7510Gb + 1Gb
Compute Nodenode02192.168.220.7610Gb
NASQNAP NAS192.168.220.8010Gb

Hardware Specifications

Control Node

ComponentSpecification
CPUAMD Threadripper PRO 9985WX — 64 cores / 128 threads @ up to 5.5 GHz
RAM256 GB (8× 32GB DIMMs)
GPUNVIDIA RTX PRO 6000 Blackwell — ~96 GB VRAM
StorageSamsung 990 PRO 4TB NVMe
OSUbuntu 24.04
IP192.168.220.75

Compute Node

ComponentSpecification
CPUAMD Ryzen 9 9950X3D — 16 cores / 32 threads
RAM128 GB
GPUNVIDIA RTX 5000 Ada Generation — ~32 GB VRAM
OSUbuntu 24.04
IP192.168.220.76

NAS

ComponentSpecification
ModelQNAP TS-855X-8G-US
CPUIntel Atom C5125 8-core
RAM8 GB DDR4
Network1× 10GbE RJ45 + 2× 2.5GbE
Drives4× WD Red Pro 18TB
RAIDRAID 5 — ~47 TB usable
NVMe Cache2× Samsung 970 EVO Plus 1TB (M.2 PCIe)
OSQuTS Hero h5.2.9 (ZFS)
IP192.168.220.80

Storage Layout

Each node has its own local partitions, with /home shared from the NAS via NFS.

PartitionControl NodeCompute Node
/1TB NVMe (local)1TB NVMe (local)
/scratch2.6TB NVMe (local)2.6TB NVMe (local)
/homeNFS — 47TB from NASNFS — 47TB from NAS
Swap10 GB10 GB
tip

For deep learning training workloads, copy datasets to /scratch/local before training. Local NVMe (~3–5 GB/s) is significantly faster than NFS (~1.2 GB/s).


Services

ServiceControl NodeCompute Node
slurmctld
slurmd
munge
prometheus
grafana
node_exporter
nvidia_gpu_exporter
slurm_exporter
apache2