Gemma 4 12B QAT vs non-QAT - 16GB VRAM Local LLM setup

In this video I am testing the QAT version of the Gemma 4 12B model from Google and comparing the quality of the QAT from Unsloth (which is q4) vs the regular q4 GGUF from Unsloth. The model is running on a local AI PC I have built with 16GB VRAM and 32GB DDR4 RAM. I run the model through a few tests which are: 1. Adherence 2. Agency 3. Coding 4. Memory If you're interested in local LLMs, AI and homelabs from the perspective of a software engineer with many years of professional experience working with LLMs in production - feel free to subscribe! Models - • QAT: https://huggingface.co/unsloth/gemma-... • non-QAT: https://huggingface.co/unsloth/gemma-... GitHub: https://github.com/lukesdevlab/youtube Patreon:   / lukesdevlab   #localllm #localai #homelab #llamacpp #homelab #gemma4 #quantization #qat Chapters: 0:00 Coming up 0:08 Intro 0:55 Models 1:16 Tests 1:39 System Specs 1:50 Adherence - q4 2:53 Adherence - QAT 3:35 Agency 5:56 Coding - q4 7:55 Coding - QAT 10:55 Memory 12:40 Conclusion