Dual AMD Radeon 9700 AI PRO: Building a 64GB LLM/AI Server with Llama.cpp
Last weekend I built a 64GB VRAM AI workstation using two new AMD Radeon AI PRO 9700 GPUs to test their performance for local LLMs. In this video, I walk through the specific hardware requirements for this dual-GPU setup, including airflow, PCIe lane splitting, and power, and show you how to configure the software stack on Linux with ROCm/Vulkan and Llama.cpp. If you are looking for a high-memory alternative to Nvidia RTX 5080, this deep dive covers exactly what the R9700 hardware can do. Timestamps: 00:00 Dual GPU Workstation Intro 02:23 Is this a Good GPU for AI? 05:30 Hardware Component List 06:54 Managing Airflow and Heat 07:49 Understanding PCIe Lane Splitting 10:31 Power Supply Requirements 11:35 OS and ROCm Installation 13:54 Installing GPU Monitoring Tools 16:08 Llama.cpp Toolboxes Overview 19:36 Vulkan vs ROCm 22:56 Setting Up the Toolboxes 25:56 Downloading Models from HF 29:29 Running LLMs via llama-cli and llama-server 34:37 Running llama-bench 37:26 Single vs Dual GPU Performance 42:21 Running Large Models on Dual GPUs 46:56 Summary and Future Plans Links: AMD Radeon™ AI PRO R9700: https://www.amd.com/en/products/graph... AMD R9700 Llama.cpp Toolboxes (GitHub): https://github.com/kyuz0/amd-r9700-ai... Live Performance Benchmarks: https://kyuz0.github.io/amd-r9700-ai-... Llama.cpp Official Repo: https://github.com/ggerganov/llama.cpp The Build Components GPUs: 2x AMD Radeon AI PRO R9700 (64GB Total VRAM) CPU: AMD Ryzen 9 9900X3D Motherboard: ASRock X870E Taichi RAM: 64GB Crucial Pro DDR5 Storage: Crucial T710 2TB NVMe PCIe 5.0 Power Supply: Corsair HX1200i (1200W) Case: Fractal Design Torrent (High Airflow)

DeepSeek V4 Flash Inference on Strix Halo: ds4, Quantizations, Distributed Inference and Benchmarks

"ever heard of AMD?" | AMD vs NVIDIA for LLMs

Building on Ethereum in the AI Era I Kevin Jones

vLLM on Dual AMD Radeon 9700 AI PRO: Tutorials, Benchmarks (vs RTX 5090/5000/4090/3090/A100)

HW News - DRAM Companies Hit Trillions of Dollars, Bambu Open Source, NVIDIA Spark Concerns

The Local AI Hardware Mistake Everyone Makes

AI buys robot and car, does exactly what experts warned.

Running a 35B AI Model on 6GB VRAM, FAST (llama.cpp Guide)

Creating a 48GB NVIDIA RTX 4090 GPU | Brother Zhang's Repair Shop (ft. 张哥)

ROCm+Linux Support on Strix Halo: It's finally stable in 2026!

Samsung's 990 Pro SSD warranty policy is a scam; I'm taking them to court.

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

Finetuning LLMs on Strix Halo – Full, LoRA, and QLoRA on Gemma-3, Qwen-3, and GPT-OSS-20B

Jfrog | Jfrog Artifactory | Jfrog Artifactory Tutorial | Artifactory Tutorial | Intellipaat

This Ridiculous $200 AI GPU Shouldn’t Be This Good

DONT Buy these GPU's for Local AI! (learn from my mistake)

Are Local Models Finally Good Enough?

Low-Latency Strix Halo Cluster with RDMA (RoCE/Intel E810) and vLLM, Framework Desktop Boards

RTX 5090, Mac Studio, or DGX Spark? I tried all three.

