Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines... M. Kaushik, S.K. Merla

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon Europe in London from April 1 - 4, 2025. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines on K8s - Meenakshi Kaushik & Shiva Krishna Merla, NVIDIA In this session, we'll cover best practices for deploying, scaling, and managing LLM inference pipelines on Kubernetes (K8s). We'll explore common patterns like inference, retrieval-augmented generation (RAG), and fine-tuning. Key challenges addressed include: [1]. Minimizing initial inference latency with model caching [2] Optimizing GPU usage with efficient scheduling, multi-GPU/node handling, and auto-quantization [3] Enhancing security and management with RBAC, monitoring, auto-scaling, and support for air-gapped clusters We'll also demonstrate building customizable pipelines for inference, RAG, and fine-tuning, and managing them post-deployment. Solutions include [1] a lightweight standalone tool built using operator pattern and [2] KServe, a robust open-source AI inference platform. This session will equip you to effectively manage LLM inference pipelines on K8s, improving performance, efficiency, and security

Kubernetes Zero to Hero: The Complete Beginner’s Guide (2025 Edition)

Kubernetes Zero to Hero: The Complete Beginner’s Guide (2025 Edition)

Share the Ride: Robust Multi-Tenancy in Kubernetes at Uber - Sashank Appireddy & Apoorva Jindal

Share the Ride: Robust Multi-Tenancy in Kubernetes at Uber - Sashank Appireddy & Apoorva Jindal

Model Context Protocol (MCP), clearly explained (why it matters)

Model Context Protocol (MCP), clearly explained (why it matters)

Large Scale Distributed LLM Inference with LLM D and Kubernetes by Abdel Sghiouar

Large Scale Distributed LLM Inference with LLM D and Kubernetes by Abdel Sghiouar

OaaS-IoT Tutorial at IPDPS 2026 Conference

OaaS-IoT Tutorial at IPDPS 2026 Conference

RAG vs. CAG: Solving Knowledge Gaps in AI Models

RAG vs. CAG: Solving Knowledge Gaps in AI Models

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar

Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kub... D. Gray

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kub... D. Gray

What’s Going on in the Containerd Neighborhood? - P. Estes, S. Karp, A. Suda, M. Brown, K. Ashok

What’s Going on in the Containerd Neighborhood? - P. Estes, S. Karp, A. Suda, M. Brown, K. Ashok

AI in Kubernetes: How to Get Started?

AI in Kubernetes: How to Get Started?

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Cloud Computing Explained: The Most Important Concepts To Know

Cloud Computing Explained: The Most Important Concepts To Know

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

If You Have A Bad Memory, I’ll Help You Fix It In 28 Minutes

If You Have A Bad Memory, I’ll Help You Fix It In 28 Minutes

AI Agents for Beginners – Part 1 (Free Labs)

AI Agents for Beginners – Part 1 (Free Labs)

Accelerating LLM Inference with vLLM

Accelerating LLM Inference with vLLM

Building the PERFECT Linux PC with Linus Torvalds

Building the PERFECT Linux PC with Linus Torvalds