Benchmark Qwen 3.6 27b & 35b - 5060 ti 16GB

Software: Lmstudio Hardware: GPU: 5060ti CPU: AMD Ryzen 5 7600 Storage: 1TB NVMe SSD (Inland) Motherboard: ASUS TUF Gaming B650-E WIFI RAM: 32GB DDR5 (2x16GB) 6000MHz C32 PSU: Mars Gaming 1000W 80+ Gold Cooler: ARCTIC Freezer 36 A-RGB Case: Phanteks XT Pro Ultra (White) asahi.w Optimization: : Qwen3.5 35b a3b and Qwen3.6 35b a3b (being an MoE model) has a huge speed advantage. Your gpu offload need to be set to Full. (40) Instead, change your "Number of Layers to force the Expert Into CPU" setting. Set this mostly to the right until it fills them to system ram instead. Then simply slide them back down to the left by 4 each time until you hit the idle vram use. We will be testing two very different architectures: the dense 27B and the massive 35B MoE (A3B). With only 16GB of VRAM, we’ll see how many layers we can offload to the GPU and if the tokens per second (tok/s) are enough for a real-world workflow. 📊 Models tested in this video: Qwen 3.6 27B (Dense): A powerhouse for logic and complex instructions. How much can we squeeze into 16GB? Qwen 3.6 35B-A3B (MoE): The Mixture of Experts version. It has more parameters, but does it run faster than the 27B? ⏱️ Timestamps: 0:00 Big thanks 0:10 Qwen 3.6 MoE No optimization 1:18 Results 1:28 Qwen 3.6 MoE 35b - optimized 2:26 Amazing Results 2:38 Qwen 3.6 27B 5060ti 3:52 Results 3 4:02 Next video Subscribe for Ep. 4 where we take these Qwen models to the RTX 3060 12GB! #RTX5060Ti #Qwen36 #Qwen3.6 #AI #LocalLLM #NVIDIA #Jordutech #GPUBenchmark #TechReview #OpenSourceAI #LMStudio #AlibabaCloud #ArtificialIntelligence #MachineLearning #PCGaming #VRAM #Quantization #MoE #DeepLearning #RTX3060 #SmartTech