llm-d: Distributed LLM Inference on Kubernetes
Blog post: https://cefboud.com/posts/llm-d/ llm-d: https://llm-d.ai/docs/getting-started 00:00 Introduction to LLMD 00:32 Why LLM inference needs smarter load balancing 01:31 Prefill vs Decode explained 03:15 KV cache awareness and session routing 04:10 How LLMD scores model servers 06:36 LLMD Router architecture 07:48 Client request flow 08:34 Envoy External Processing (ExtProc) 10:04 End-to-end request routing 12:49 Gateway API Inference Extension 15:18 Prefill/Decode disaggregation 17:13 KV cache transfer with NCCL & RDMA 18:13 Plugin architecture and extensibility 19:59 Flow control, priorities & autoscaling 20:47 Final thoughts

▶︎
How Reasoning LLMs Work (RL, Thinking Tags & Budgets Explained)

▶︎
Is RAG Still Needed? Choosing the Best Approach for LLMs

▶︎
LLM Inference Engines: vLLM, KV Cache, Paged attention and Continuous Batching.

▶︎
MIT Just Revealed the AI Bubble's Fatal Flaw

▶︎
Harness Engineering Masterclass: Technical Deep Dive on how to build Agentic Systems

▶︎
Should You Still Become a Software Engineer in 2026? GitHub VP

▶︎
Kubernetes Zero to Hero: The Complete Beginner’s Guide (2025 Edition)

▶︎
Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

▶︎
Five things every developer should know about building mission-critical systems - Loek Duys

▶︎
NestJS Full Course for Beginners in 2026 | Build a Production-Ready API

▶︎
🚗 BYD : The biggest SCAM of the car industry ?

▶︎
LLM Quantization: Smaller, Faster, Cheaper AI Models

▶︎
You Can Learn AI Agent Harness & Loop Engineering In 19 Min | LLM Ops, Eval, Tracing, RAG
![Yann LeCun's $1B Bet Against LLMs [Part 1]](https://i.ytimg.com/vi/kYkIdXwW2AE/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDbV4izF3i-wxevCVIn7FJjoy1vlA)
▶︎
Yann LeCun's $1B Bet Against LLMs [Part 1]

▶︎
LLM inference optimization: Architecture, KV cache and Flash attention
![Kubernetes Crash Course for Absolute Beginners [NEW]](https://i.ytimg.com/vi/s_o8dwzRlu4/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLAfg4KRReNtQkLAjORAuzDyyoaBFg)
▶︎
Kubernetes Crash Course for Absolute Beginners [NEW]

▶︎
System Design Concepts Course and Interview Prep

▶︎
Too Many Parameters? Use This Pattern

▶︎
Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

▶︎
