Qwen 3.6 27B on a 5070 Ti: my full local AI agent build
A complete walkthrough of my personal AI assistant. the model, the agent loop, the chat interface, and the honest benchmark numbers. Runs on a single RTX 5070 Ti (16GB VRAM) on Kubuntu. Nothing leaves my network. No sub, no API, no rate limits. The stack: llama.cpp for inference Qwen 3.6 27B (HauhauCS uncensored fine-tune, Q3_K_P quant) nanobot for the agent loop Telegram as the chat channel whisper-faster for voice transcription SearXNG for local web search Hardware: Ryzen 7 7800X3D, 32GB DDR5-6000, RTX 5070 Ti 16GB. Links: llama.cpp — https://github.com/ggml-org/llama.cpp Qwen 3.6 27B (official) — https://huggingface.co/Qwen/Qwen3.6-27B HauhauCS uncensored — https://huggingface.co/HauhauCS/Qwen3... nanobot — https://github.com/HKUDS/nanobot SearXNG — https://github.com/searxng/searxng Benchmark numbers (Q3_K_P, flash attention, KV cache q8_0): Empty context → 1527 t/s prefill, 42 t/s decode 8K context → 1544 t/s prefill, 43 t/s decode 16K context → 1389 t/s prefill, 41 t/s decode 32K context → 1077 t/s prefill, 30 t/s decode (2 layer offload for bench) Build: llama.cpp 0adede866 (8925) Note: I'm on CUDA 13.1 deliberately. CUDA 13.2 has a known bug producing gibberish outputs with this model. NVIDIA acknowledged it but no fix at the time of recording. Don't update. If you're running a similar setup with a better config, drop it in the comments.

Run a 30B Model on a Cheap GPU | The Only Local AI Guide You Need

AMD MI50 32GB for Local AI: Qwen 3.6 & Gemma 4 on llama.cpp / vLLM (vs R9700)

Does Google Gemma 4 E4B Work for Home Assistant ?(No GPU)

The Local AI Hardware Mistake Everyone Makes

The Best Local Agentic Coding Workflow (Complete Guide)

Benchmark Qwen 3.6 27b & 35b - 5060 ti 16GB

I Tested the Cheapest Path to 96GB of VRAM

I tested PewDiePie's AI platform...

Gemma 4 vs Qwen 3.6 Local Ai Benchmarking

Local AI on Unraid - The Stuff Nobody Tells You

Ultimate Guide Local AI Setup (Qwen3.6 + LlamaC++ + TurboQuant)

The Best LOCAL Agentic Coding Workflow (Complete Guide)

Testing Qwen3.6 35B A3B with OpenCode on the M5 Pro

Ling 2.6 (1T) vs Qwen 3.6 (27B) Local AI - How much Better is Bigger? 🤯

One llama.cpp Update Made Local AI 65% Faster

Qwen3.6 27B Is INSANE – Is This a LOCAL Claude Opus Competitor?

3 New PCs, One Giant AI Model… This Shouldn’t Work

Qwen 3.6 27b Local Ai Review and Benchmark

How I Made Gemma 4 10x Faster on Jetson Orin Nano

