Qwen 3.6 27B on a 5070 Ti: my full local AI agent build

A complete walkthrough of my personal AI assistant. the model, the agent loop, the chat interface, and the honest benchmark numbers. Runs on a single RTX 5070 Ti (16GB VRAM) on Kubuntu. Nothing leaves my network. No sub, no API, no rate limits. The stack: llama.cpp for inference Qwen 3.6 27B (HauhauCS uncensored fine-tune, Q3_K_P quant) nanobot for the agent loop Telegram as the chat channel whisper-faster for voice transcription SearXNG for local web search Hardware: Ryzen 7 7800X3D, 32GB DDR5-6000, RTX 5070 Ti 16GB. Links: llama.cpp — https://github.com/ggml-org/llama.cpp Qwen 3.6 27B (official) — https://huggingface.co/Qwen/Qwen3.6-27B HauhauCS uncensored — https://huggingface.co/HauhauCS/Qwen3... nanobot — https://github.com/HKUDS/nanobot SearXNG — https://github.com/searxng/searxng Benchmark numbers (Q3_K_P, flash attention, KV cache q8_0): Empty context → 1527 t/s prefill, 42 t/s decode 8K context → 1544 t/s prefill, 43 t/s decode 16K context → 1389 t/s prefill, 41 t/s decode 32K context → 1077 t/s prefill, 30 t/s decode (2 layer offload for bench) Build: llama.cpp 0adede866 (8925) Note: I'm on CUDA 13.1 deliberately. CUDA 13.2 has a known bug producing gibberish outputs with this model. NVIDIA acknowledged it but no fix at the time of recording. Don't update. If you're running a similar setup with a better config, drop it in the comments.

Run a 30B Model on a Cheap GPU | The Only Local AI Guide You Need

Run a 30B Model on a Cheap GPU | The Only Local AI Guide You Need

AMD MI50 32GB for Local AI: Qwen 3.6 & Gemma 4 on llama.cpp / vLLM (vs R9700)

AMD MI50 32GB for Local AI: Qwen 3.6 & Gemma 4 on llama.cpp / vLLM (vs R9700)

Does Google Gemma 4 E4B Work for Home Assistant ?(No GPU)

Does Google Gemma 4 E4B Work for Home Assistant ?(No GPU)

The Local AI Hardware Mistake Everyone Makes

The Local AI Hardware Mistake Everyone Makes

The Best Local Agentic Coding Workflow (Complete Guide)

The Best Local Agentic Coding Workflow (Complete Guide)

Benchmark Qwen 3.6 27b & 35b - 5060 ti 16GB

Benchmark Qwen 3.6 27b & 35b - 5060 ti 16GB

I Tested the Cheapest Path to 96GB of VRAM

I Tested the Cheapest Path to 96GB of VRAM

I tested PewDiePie's AI platform...

I tested PewDiePie's AI platform...

Gemma 4 vs Qwen 3.6 Local Ai Benchmarking

Gemma 4 vs Qwen 3.6 Local Ai Benchmarking

Local AI on Unraid - The Stuff Nobody Tells You

Local AI on Unraid - The Stuff Nobody Tells You

Ultimate Guide Local AI Setup (Qwen3.6 + LlamaC++ + TurboQuant)

Ultimate Guide Local AI Setup (Qwen3.6 + LlamaC++ + TurboQuant)

The Best LOCAL Agentic Coding Workflow (Complete Guide)

The Best LOCAL Agentic Coding Workflow (Complete Guide)

Testing Qwen3.6 35B A3B with OpenCode on the M5 Pro

Testing Qwen3.6 35B A3B with OpenCode on the M5 Pro

Ling 2.6 (1T) vs Qwen 3.6 (27B) Local AI - How much Better is Bigger? 🤯

Ling 2.6 (1T) vs Qwen 3.6 (27B) Local AI - How much Better is Bigger? 🤯

One llama.cpp Update Made Local AI 65% Faster

One llama.cpp Update Made Local AI 65% Faster

Qwen3.6 27B Is INSANE – Is This a LOCAL Claude Opus Competitor?

Qwen3.6 27B Is INSANE – Is This a LOCAL Claude Opus Competitor?

3 New PCs, One Giant AI Model… This Shouldn’t Work

3 New PCs, One Giant AI Model… This Shouldn’t Work

Qwen 3.6 27b Local Ai Review and Benchmark

Qwen 3.6 27b Local Ai Review and Benchmark

How I Made Gemma 4 10x Faster on Jetson Orin Nano

How I Made Gemma 4 10x Faster on Jetson Orin Nano

Should You Buy nVidia RTX 5070ti 16gb GPU for Local AI? Qwen 3.6 Agents?

Should You Buy nVidia RTX 5070ti 16gb GPU for Local AI? Qwen 3.6 Agents?