Nebius on Why AI Infrastructure Is More Than GPUs

In this episode of Data Insights, Allyson Klein and Jeniece Wnorowski welcome Hitesh Kumar, GPU Cluster Design Architect at Nebius, for a discussion on the realities of designing and deploying AI infrastructure at scale. The conversation explores how AI infrastructure has evolved beyond GPUs into a full-system challenge involving power, cooling, networking, storage, and interconnect technologies. Hitesh explains how increasing compute density is reshaping data center design, why storage and networking are becoming critical components of AI performance, and how cluster architectures are evolving to support the next generation of training and inference workloads. The episode also examines operational challenges such as component failures, fault tolerance, and infrastructure scalability. Check out more from Nebius: https://nebius.com Hosts: Allyson Klein, TechArena Jeniece Wnorowski, Solidigm Guest: Hitesh Kumar, GPU Cluster Design Architect, Nebius Chapters: 00:04 – Introduction and guest welcome 01:38 – Nebius and the evolution of AI infrastructure 03:14 – Rising GPU density and deployment challenges 05:19 – Why AI infrastructure is more than GPUs 08:01 – Storage demands for AI training and inference 10:08 – Interconnect technologies and network evolution 12:07 – Common misconceptions about scaling GPU clusters 15:59 – AI-native cloud versus traditional cloud architectures 18:14 – Build, buy, or partner: infrastructure decisions 20:07 – The future of AI cluster design 23:14 – Learning resources and industry insights 24:00 – Closing thoughts and key takeaways #gpu #datastorage #networking #cloudarchitecture #datainsights #nebius #techarena #solidigm