Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA

Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA Dynamic Resource Allocation (DRA) is new Kubernetes feature that puts resource scheduling in the hands of 3rd-party developers. It moves away from the limited "countable" interface for requesting access to resources (e.g. "nvidia.com/gpu: 2"), providing an API more akin to that of persistent volumes. In the context of GPUs, this unlocks a host of new features without the need for awkward solutions shoehorned on top of the existing device plugin API. These features include: * Controlled GPU Sharing (both within a pod and across pods) * Multiple GPU models per node (e.g. T4 and A100) * Specifying arbitrary constraints for a GPU (min/max memory, device model, etc.) * Dynamic allocation of Multi-Instance GPUs (MIG) * … the list goes on ... In this talk, you will learn about the DRA resource driver we have built for GPUs. We walk through each of the features it provides, and conclude with a series of demos showing you how you can get started using it today.

Mastering GPU Management in Kubernetes Using the Operator Pattern- Shiva Krishna Merla & Kevin Klues
▶︎

Mastering GPU Management in Kubernetes Using the Operator Pattern- Shiva Krishna Merla & Kevin Klues

GPUs in Kubernetes for AI Workloads
▶︎

GPUs in Kubernetes for AI Workloads

NVIDIA Vera CPU: The Processor for Agentic AI
▶︎

NVIDIA Vera CPU: The Processor for Agentic AI

Which GPU Sharing Strategy Is Right for You? A Comprehensive Benchmark Study Us... K. Klues, Y. Chen
▶︎

Which GPU Sharing Strategy Is Right for You? A Comprehensive Benchmark Study Us... K. Klues, Y. Chen

Scaling AI Workloads with Kubernetes: Sharing GPU Resources Across Multiple Containers - Jack Ong
▶︎

Scaling AI Workloads with Kubernetes: Sharing GPU Resources Across Multiple Containers - Jack Ong

Explain How Kubernetes Works With GPU Like I’m 5 - Carlos Santana, AWS
▶︎

Explain How Kubernetes Works With GPU Like I’m 5 - Carlos Santana, AWS

Multi-GPU Communication Libraries for Scaling HPC and AI Workloads | NVIDIA GTC 2025
▶︎

Multi-GPU Communication Libraries for Scaling HPC and AI Workloads | NVIDIA GTC 2025

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA
▶︎

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

Everything You Wanted to Know About RDMA But Were Too Proud to Ask
▶︎

Everything You Wanted to Know About RDMA But Were Too Proud to Ask

AI in Kubernetes: How to Get Started?
▶︎

AI in Kubernetes: How to Get Started?

NVIDIA GPU Operator Overview
▶︎

NVIDIA GPU Operator Overview

TSC Edgelake OpenHorizon 2026 05 05 full
▶︎

TSC Edgelake OpenHorizon 2026 05 05 full

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu
▶︎

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu

Kubernetes Design Principles: Understand the Why - Saad Ali, Google
▶︎

Kubernetes Design Principles: Understand the Why - Saad Ali, Google

Comparing Sidecar-Less Service Mesh from Cilium and Istio - Christian Posta, Solo.io
▶︎

Comparing Sidecar-Less Service Mesh from Cilium and Istio - Christian Posta, Solo.io

Everything you Need to Know about using GPUs with Kubernetes - Rohit Agarwal, Google
▶︎

Everything you Need to Know about using GPUs with Kubernetes - Rohit Agarwal, Google

NVIDIA didn't want me to do this
▶︎

NVIDIA didn't want me to do this

Understanding Kubernetes Networking in 30 Minutes - Ricardo Katz & James Strong
▶︎

Understanding Kubernetes Networking in 30 Minutes - Ricardo Katz & James Strong

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou
▶︎

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026
▶︎

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026