Dual AMD Radeon 9700 AI PRO: Building a 64GB LLM/AI Server with Llama.cpp

Last weekend I built a 64GB VRAM AI workstation using two new AMD Radeon AI PRO 9700 GPUs to test their performance for local LLMs. In this video, I walk through the specific hardware requirements for this dual-GPU setup, including airflow, PCIe lane splitting, and power, and show you how to configure the software stack on Linux with ROCm/Vulkan and Llama.cpp. If you are looking for a high-memory alternative to Nvidia RTX 5080, this deep dive covers exactly what the R9700 hardware can do. Timestamps: 00:00 Dual GPU Workstation Intro 02:23 Is this a Good GPU for AI? 05:30 Hardware Component List 06:54 Managing Airflow and Heat 07:49 Understanding PCIe Lane Splitting 10:31 Power Supply Requirements 11:35 OS and ROCm Installation 13:54 Installing GPU Monitoring Tools 16:08 Llama.cpp Toolboxes Overview 19:36 Vulkan vs ROCm 22:56 Setting Up the Toolboxes 25:56 Downloading Models from HF 29:29 Running LLMs via llama-cli and llama-server 34:37 Running llama-bench 37:26 Single vs Dual GPU Performance 42:21 Running Large Models on Dual GPUs 46:56 Summary and Future Plans Links: AMD Radeon™ AI PRO R9700: https://www.amd.com/en/products/graph... AMD R9700 Llama.cpp Toolboxes (GitHub): https://github.com/kyuz0/amd-r9700-ai... Live Performance Benchmarks: https://kyuz0.github.io/amd-r9700-ai-... Llama.cpp Official Repo: https://github.com/ggerganov/llama.cpp The Build Components GPUs: 2x AMD Radeon AI PRO R9700 (64GB Total VRAM) CPU: AMD Ryzen 9 9900X3D Motherboard: ASRock X870E Taichi RAM: 64GB Crucial Pro DDR5 Storage: Crucial T710 2TB NVMe PCIe 5.0 Power Supply: Corsair HX1200i (1200W) Case: Fractal Design Torrent (High Airflow)

DeepSeek V4 Flash Inference on Strix Halo: ds4, Quantizations, Distributed Inference and Benchmarks

DeepSeek V4 Flash Inference on Strix Halo: ds4, Quantizations, Distributed Inference and Benchmarks

"ever heard of AMD?" | AMD vs NVIDIA for LLMs

"ever heard of AMD?" | AMD vs NVIDIA for LLMs

Building on Ethereum in the AI Era I Kevin Jones

Building on Ethereum in the AI Era I Kevin Jones

vLLM on Dual AMD Radeon 9700 AI PRO: Tutorials, Benchmarks (vs RTX 5090/5000/4090/3090/A100)

vLLM on Dual AMD Radeon 9700 AI PRO: Tutorials, Benchmarks (vs RTX 5090/5000/4090/3090/A100)

HW News - DRAM Companies Hit Trillions of Dollars, Bambu Open Source, NVIDIA Spark Concerns

HW News - DRAM Companies Hit Trillions of Dollars, Bambu Open Source, NVIDIA Spark Concerns

The Local AI Hardware Mistake Everyone Makes

The Local AI Hardware Mistake Everyone Makes

AI buys robot and car, does exactly what experts warned.

AI buys robot and car, does exactly what experts warned.

Running a 35B AI Model on 6GB VRAM, FAST (llama.cpp Guide)

Running a 35B AI Model on 6GB VRAM, FAST (llama.cpp Guide)

Creating a 48GB NVIDIA RTX 4090 GPU | Brother Zhang's Repair Shop (ft. 张哥)

Creating a 48GB NVIDIA RTX 4090 GPU | Brother Zhang's Repair Shop (ft. 张哥)

ROCm+Linux Support on Strix Halo: It's finally stable in 2026!

ROCm+Linux Support on Strix Halo: It's finally stable in 2026!

Samsung's 990 Pro SSD warranty policy is a scam; I'm taking them to court.

Samsung's 990 Pro SSD warranty policy is a scam; I'm taking them to court.

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

Finetuning LLMs on Strix Halo – Full, LoRA, and QLoRA on Gemma-3, Qwen-3, and GPT-OSS-20B

Finetuning LLMs on Strix Halo – Full, LoRA, and QLoRA on Gemma-3, Qwen-3, and GPT-OSS-20B

Jfrog | Jfrog Artifactory | Jfrog Artifactory Tutorial | Artifactory Tutorial | Intellipaat

Jfrog | Jfrog Artifactory | Jfrog Artifactory Tutorial | Artifactory Tutorial | Intellipaat

This Ridiculous $200 AI GPU Shouldn’t Be This Good

This Ridiculous $200 AI GPU Shouldn’t Be This Good

DONT Buy these GPU's for Local AI! (learn from my mistake)

DONT Buy these GPU's for Local AI! (learn from my mistake)

Are Local Models Finally Good Enough?

Are Local Models Finally Good Enough?

Low-Latency Strix Halo Cluster with RDMA (RoCE/Intel E810) and vLLM, Framework Desktop Boards

Low-Latency Strix Halo Cluster with RDMA (RoCE/Intel E810) and vLLM, Framework Desktop Boards

RTX 5090, Mac Studio, or DGX Spark? I tried all three.

RTX 5090, Mac Studio, or DGX Spark? I tried all three.

NVIDIA didn't want me to do this

NVIDIA didn't want me to do this