GLM 5.2 on Dual Strix Halo (256GB): Worth it?

This video covers my experiments running the new GLM 5.2 on a dual Strix Halo setup (2 Framework Desktops). I cover what quantizations you can run and the speed you get in terms of token generation and prompt processing, comparing it with DeepSeek V4 Flash. I also ran SWE Bench verified mini to check the actual performance at coding tasks. Timestamps: 00:00 | Introduction 01:28 | Comparison with DS4 Flash 03:30 | GLM 5.2 UD-IQ2_M Quant 05:46 | Benchmarks (pp,tg) 06:58 | SWE Bench Verified Mini 10:48 | GLM 5.2 Distributed Inference 15:29 | pi coding agent / VSCode Model weights: https://huggingface.co/unsloth/GLM-5.... Pi coding benchmarks: https://pi-local-coding-bench.dev/ Check these links for more information on the llama.cpp toolbox: https://strix-halo-toolboxes.com/ https://github.com/kyuz0/amd-strix-ha... https://github.com/ggml-org/llama.cpp Buy Me a Coffee: https://buymeacoffee.com/dcapitella

DeepSeek V4 Flash Inference on Strix Halo: ds4, Quantizations, Distributed Inference and Benchmarks

DeepSeek V4 Flash Inference on Strix Halo: ds4, Quantizations, Distributed Inference and Benchmarks

GLM 5.2 is SO GOOD (and almost free)

GLM 5.2 is SO GOOD (and almost free)

VibeThinker 3B - Taking on Giant Models

VibeThinker 3B - Taking on Giant Models

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

3 New PCs, One Giant AI Model… This Shouldn’t Work

3 New PCs, One Giant AI Model… This Shouldn’t Work

Working a Full Day in a Random Asian Tech Mall

Working a Full Day in a Random Asian Tech Mall

I tested PewDiePie's AI platform...

I tested PewDiePie's AI platform...

Local Coding Agents on Strix Halo and R9700: Pi, Opencode, and SWE-bench Mini Benchmarks

Local Coding Agents on Strix Halo and R9700: Pi, Opencode, and SWE-bench Mini Benchmarks

Anthropic is coming for EVERYTHING

Anthropic is coming for EVERYTHING

Intel Arc Pro B70 (32GB) for Local LLMs: llama.cpp (SYCL/Vulkan), vLLM (Intel LLM Scaler) Benchmarks

Intel Arc Pro B70 (32GB) for Local LLMs: llama.cpp (SYCL/Vulkan), vLLM (Intel LLM Scaler) Benchmarks

GLM 5.2 is my new favorite model...

GLM 5.2 is my new favorite model...

I Put 64GB of RAM in Valve's Steam Machine! (Teardown & Upgrades)

I Put 64GB of RAM in Valve's Steam Machine! (Teardown & Upgrades)

Leave Windows 11 Idle for 24 Hours and Watch What Happens

Leave Windows 11 Idle for 24 Hours and Watch What Happens

Dual AMD Radeon 9700 AI PRO: Building a 64GB LLM/AI Server with Llama.cpp

Dual AMD Radeon 9700 AI PRO: Building a 64GB LLM/AI Server with Llama.cpp

I Tested UE 5.8 New MCP With Claude Code And It’s …

I Tested UE 5.8 New MCP With Claude Code And It’s …

Running vLLM on Strix Halo (AMD Ryzen AI MAX) + ROCm Performance Updates

Running vLLM on Strix Halo (AMD Ryzen AI MAX) + ROCm Performance Updates

The Best Local Agentic Coding Workflow (Complete Guide)

The Best Local Agentic Coding Workflow (Complete Guide)

I asked Claude Code to make me as much money as possible

I asked Claude Code to make me as much money as possible

Ethernet is DEAD?? Mac Studio is 100x FASTER!!

Ethernet is DEAD?? Mac Studio is 100x FASTER!!

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup