GLM 5.2 on Dual Strix Halo (256GB): Worth it?

This video covers my experiments running the new GLM 5.2 on a dual Strix Halo setup (2 Framework Desktops). I cover what quantizations you can run and the speed you get in terms of token generation and prompt processing, comparing it with DeepSeek V4 Flash. I also ran SWE Bench verified mini to check the actual performance at coding tasks. Timestamps: 00:00 | Introduction 01:28 | Comparison with DS4 Flash 03:30 | GLM 5.2 UD-IQ2_M Quant 05:46 | Benchmarks (pp,tg) 06:58 | SWE Bench Verified Mini 10:48 | GLM 5.2 Distributed Inference 15:29 | pi coding agent / VSCode Model weights: https://huggingface.co/unsloth/GLM-5.... Pi coding benchmarks: https://pi-local-coding-bench.dev/ Check these links for more information on the llama.cpp toolbox: https://strix-halo-toolboxes.com/ https://github.com/kyuz0/amd-strix-ha... https://github.com/ggml-org/llama.cpp Buy Me a Coffee: https://buymeacoffee.com/dcapitella