CNPDX May: Dynamo: Large Scale Distributed Inference

David Zeir, Director, DL System Software, Nvidia Neelay Shah, Distinguished Engineer, Nvidia This talk introduces Dynamo, NVIDIA's open-source Kubernetes-native distributed inference platform. We'll cover the problem space, walk through Dynamo's architecture — disaggregated prefill/decode, KV-cache-aware routing, and a transport layer that moves KV blocks directly between GPUs — and dig into the Kubernetes integration for scheduling, autoscaling, and graceful failure handling. We'll close with a demo of Dynamo serving a real workload.