VibeVoice (Speech Generation/Voice Cloning) on Framework Desktop with Strix Halo (AMD AI Ryzen MAX+)

In this video, I show how to generate natural-sounding speech locally on the Framework Desktop with AMD Ryzen AI Max “Strix Halo” — including cloning a voice from a short sample and creating multi-speaker conversations. The intro you hear at the start was entirely generated by VibeVoice, cloned from my own voice. VibeVoice is Microsoft’s open-weight model for long-form, multi-speaker speech (released late August 2025). I’ll walk you through setup on Strix Halo using a Fedora toolbox and the Gradio UI, then demo single-speaker and multi-speaker clips, plus zero-shot voice cloning. I’ll also cover stability fixes for ROCm crashes. Timestamps: 00:00 — AI-Generated Intro (VibeVoice) 01:47 — Setup on Strix Halo (Toolbox + Gradio) 03:28 — First Demo: Single-Speaker 05:18 — Multi-Speaker Conversations 05:42 — Clone Your Own Voice (Zero-Shot) 06:23 — Stability Fixes (librosa / numba / LLVM / ROCm) 08:26 — Generating a Full Podcast 09:33 — AI-Generated Podcast: How VibeVoice Works — — — Links & Resources: GitHub repo (toolboxes, scripts, stability fixes): https://github.com/kyuz0/amd-strix-ha... Framework Desktop (Strix Halo): https://frame.work/ Strix Halo Homelab guide + Discord (by deseven): https://strixhalo-homelab.d7.wtf/ VibeVoice (project): https://github.com/microsoft/VibeVoice https://microsoft.github.io/VibeVoice/ VibeVoice models (Hugging Face): https://huggingface.co/microsoft/Vibe... (Community mirror example for large weights): https://huggingface.co/aoi-ot/VibeVoi... Gradio (UI framework): https://github.com/gradio-app/gradio Librosa (audio features): https://github.com/librosa/librosa Numba (JIT; disabled in this toolbox fix): https://github.com/numba/numba LLVM (compiler backend): https://llvm.org/