DeepSeek V4 Flash Inference on Strix Halo: ds4, Quantizations, Distributed Inference and Benchmarks

An overview of running DeepSeek V4 Flash locally on AMD Strix Halo devices like the Framework Desktop. This covers the use of the ds4 (DwarfStar 4) dedicated inference engine and the community-driven ROCm port that enables HIP support for AMD hardware. A breakdown of the challenges involved in fitting large weights into unified memory, addressing the accuracy issues of 2-bit quantization by utilizing imatrix (importance matrix) calibration. The configuration covers single-node setups using Q2 and hybrid 4-bit layers within a 128GB memory limit, as well as multi-node cluster configurations to run the full 4-bit quantization across two Strix Halo systems. Timestamps: 00:00 - Introduction 01:37 - Initial Concerns About DS4 03:31 - The DS4 Project 04:31 - The ROCm/Strix Halo Port 08:09 - The Available Quantizations 10:34 - DS4 Benchmarks 14:00 - SWE Bench Mini 18:08 - DS4 Setup & Inference 25:24 - DS4 Multi-Node 30:48 - Conclusion Links & Resources: Strix Halo Toolboxes & Guides: https://strix-halo-toolboxes.com ds4 Project Repository: https://github.com/antirez/ds4 Buy Me a Coffee: https://buymeacoffee.com/dcapitella

How to Actually Learn C (2027 Edition)

How to Actually Learn C (2027 Edition)

AMD Finally Matters

AMD Finally Matters

Finetuning LLMs on Strix Halo – Full, LoRA, and QLoRA on Gemma-3, Qwen-3, and GPT-OSS-20B

Finetuning LLMs on Strix Halo – Full, LoRA, and QLoRA on Gemma-3, Qwen-3, and GPT-OSS-20B

Wi-Fi Will Never Be the Same

Wi-Fi Will Never Be the Same

Microsoft Just Released Their Own Linux Distro: Should You Be Worried?

Microsoft Just Released Their Own Linux Distro: Should You Be Worried?

NVIDIA Monopoly is DEAD | OPEN-SOURCE Chips Are HERE!

NVIDIA Monopoly is DEAD | OPEN-SOURCE Chips Are HERE!

Prof. Jeffrey Sachs : Netanyahu vs. Trump

Prof. Jeffrey Sachs : Netanyahu vs. Trump

AMD's Strix Successor Just Caught the M4 Pro

AMD's Strix Successor Just Caught the M4 Pro

One man just liberated Fable... and now it’s illegal

One man just liberated Fable... and now it’s illegal

Dual AMD Radeon 9700 AI PRO: Building a 64GB LLM/AI Server with Llama.cpp

Dual AMD Radeon 9700 AI PRO: Building a 64GB LLM/AI Server with Llama.cpp

VibeVoice (Speech Generation/Voice Cloning) on Framework Desktop with Strix Halo (AMD AI Ryzen MAX+)

VibeVoice (Speech Generation/Voice Cloning) on Framework Desktop with Strix Halo (AMD AI Ryzen MAX+)

Why DeepSeek V4 Has Everyone Freaking Out

Why DeepSeek V4 Has Everyone Freaking Out

Fast Finetuning of Gemma-3, Qwen-3 and GPT-OSS on Strix Halo using Unsloth and Multi-Node Setups

Fast Finetuning of Gemma-3, Qwen-3 and GPT-OSS on Strix Halo using Unsloth and Multi-Node Setups

This 2-Bit Gemma 4 Shouldn't Work — But It Does

This 2-Bit Gemma 4 Shouldn't Work — But It Does

Ai will Fail and I can prove it

Ai will Fail and I can prove it

Local Coding Agents on Strix Halo and R9700: Pi, Opencode, and SWE-bench Mini Benchmarks

Local Coding Agents on Strix Halo and R9700: Pi, Opencode, and SWE-bench Mini Benchmarks

Co-Creator of Haskell: Functional Programming, Thinking in Types, Useless Languages | Simon Jones

Co-Creator of Haskell: Functional Programming, Thinking in Types, Useless Languages | Simon Jones

DistroWatch turns 25, NixOS 26.05, NVIDIA RTX Spark, AUR Malware, & more Linux news

DistroWatch turns 25, NixOS 26.05, NVIDIA RTX Spark, AUR Malware, & more Linux news

You NEED to STOP Using Windows 11 Right Now

You NEED to STOP Using Windows 11 Right Now

Intel Arc Pro B70 (32GB) for Local LLMs: llama.cpp (SYCL/Vulkan), vLLM (Intel LLM Scaler) Benchmarks

Intel Arc Pro B70 (32GB) for Local LLMs: llama.cpp (SYCL/Vulkan), vLLM (Intel LLM Scaler) Benchmarks