Low-Latency Strix Halo Cluster with RDMA (RoCE/Intel E810) and vLLM, Framework Desktop Boards

In this video, I move beyond Ethernet and llama.cpp and show a 2-node Strix Halo cluster using RDMA and vLLM tensor parallelism. The setup uses two Framework Desktop motherboards with 128 GB of unified memory each, connected directly via Intel E810 cards configured for RoCE. I cover the hardware details that matter for this build: direct-attached RDMA, custom cooling for the E810, and why running a x16 NIC in a PCIe x4 slot isn’t a real problem for inference. I also compare RDMA latency to standard Ethernet and explain why low latency is the key enabler for tensor parallelism here. On the software side, I walk through vLLM with Ray and the main blocker I hit along the way: missing RCCL support for gfx1151 in upstream ROCm. I explain what broke, how I patched RCCL to make multi-node tensor parallelism work on Strix Halo, and how to reproduce the setup using my toolboxes. Timestamps 00:00 – Introduction 01:14 – The Hardware 02:09 – RDMA / RoCE Network Card 03:28 – Custom Cooling for Intel E810 06:22 – PCIe Lane Caveat (x16 to x4) 08:32 – ROCm / RCCL gfx1151 Support 10:31 – Configuration Tutorial 13:06 – Benchmarks 14:03 – Conclusion Links & Resources Strix Halo Toolboxes & Guides: https://strix-halo-toolboxes vLLM Strix Halo Toolboxes (patched RCCL): https://github.com/kyuz0/amd-strix-ha... vLLM Benchmarks: https://kyuz0.github.io/amd-strix-hal... vLLM Tensor Parallelism: https://developers.redhat.com/article...

DeepSeek V4 Flash Inference on Strix Halo: ds4, Quantizations, Distributed Inference and Benchmarks

DeepSeek V4 Flash Inference on Strix Halo: ds4, Quantizations, Distributed Inference and Benchmarks

Intel Arc Pro B70 (32GB) for Local LLMs: llama.cpp (SYCL/Vulkan), vLLM (Intel LLM Scaler) Benchmarks

Intel Arc Pro B70 (32GB) for Local LLMs: llama.cpp (SYCL/Vulkan), vLLM (Intel LLM Scaler) Benchmarks

Industry Perspective by Cloudflare: Dismantling Tycoon 2FA, Inside a Global Phishing Takedown

Industry Perspective by Cloudflare: Dismantling Tycoon 2FA, Inside a Global Phishing Takedown

Creating a 48GB NVIDIA RTX 4090 GPU | Brother Zhang's Repair Shop (ft. 张哥)

Creating a 48GB NVIDIA RTX 4090 GPU | Brother Zhang's Repair Shop (ft. 张哥)

I built a private AI mini-cluster with Framework Desktop

I built a private AI mini-cluster with Framework Desktop

Unbelievable Workers | Working with Talented Engineers #46 #fail #adamrose #smartworkers

Unbelievable Workers | Working with Talented Engineers #46 #fail #adamrose #smartworkers

Samsung's 990 Pro SSD warranty policy is a scam; I'm taking them to court.

Samsung's 990 Pro SSD warranty policy is a scam; I'm taking them to court.

Trump’s Big Violent 80th Birthday Party at the White House, "Great Deal" with Iran & NY Knicks Win

Trump’s Big Violent 80th Birthday Party at the White House, "Great Deal" with Iran & NY Knicks Win

The Local AI Hardware Mistake Everyone Makes

The Local AI Hardware Mistake Everyone Makes

This Dell Pro Max with GB10 is Already Paying for Itself

This Dell Pro Max with GB10 is Already Paying for Itself

I Hacked This Temu Router. What I Found Should Be Illegal.

I Hacked This Temu Router. What I Found Should Be Illegal.

Running vLLM on Strix Halo (AMD Ryzen AI MAX) + ROCm Performance Updates

Running vLLM on Strix Halo (AMD Ryzen AI MAX) + ROCm Performance Updates

Local Coding Agents on Strix Halo and R9700: Pi, Opencode, and SWE-bench Mini Benchmarks

Local Coding Agents on Strix Halo and R9700: Pi, Opencode, and SWE-bench Mini Benchmarks

I Built a $3000 AI Computer for Profit in 2026

I Built a $3000 AI Computer for Profit in 2026

I’m glad I didn’t invest, so I can talk the Framework Desktop

I’m glad I didn’t invest, so I can talk the Framework Desktop

Unbelievable Workers Compilation | Working with Talented Engineers #45 #adamrose #smartworkers

Unbelievable Workers Compilation | Working with Talented Engineers #45 #adamrose #smartworkers

This chip is 1000x Faster than Nvidia's GPU ! ( Not lying )

This chip is 1000x Faster than Nvidia's GPU ! ( Not lying )

EXPOSED: The Dirty Little Secret of AI (On a 1979 PDP-11)

EXPOSED: The Dirty Little Secret of AI (On a 1979 PDP-11)

Linux on the New Framework Desktop PC!

Linux on the New Framework Desktop PC!

Fast Finetuning of Gemma-3, Qwen-3 and GPT-OSS on Strix Halo using Unsloth and Multi-Node Setups

Fast Finetuning of Gemma-3, Qwen-3 and GPT-OSS on Strix Halo using Unsloth and Multi-Node Setups