Keynote: AI Data Center Networking: Lessons from Meta's Evolution
Abstract Meta has iterated through multiple generations of networking architectures to support increasingly demanding machine learning workloads. This keynote outlines the journey through successive AI network deployments, examining how each iteration informed the next and sharing hard-won lessons from production operations. This keynote will provide a comprehensive view of the AI networking evolution through the lens of complementary technology layers: PyTorch and AI frameworks, xPU characteristics and their traffic patterns, NIC selection and capabilities, and the resulting network implications. The talk will explore how decisions at each layer cascade through the stack—how framework behavior influences hardware selection, how accelerator characteristics drive network topology choices, and how NIC capabilities enable or constrain operational approaches. The discussion will cover practical experiences with network automation at scale, infrastructure density challenges (power, cooling, space), telemetry approaches for AI workload visibility, and operational strategies for managing rapid technology transitions while maintaining production stability. Attendees will gain insight into the architectural decisions, false starts, and breakthrough solutions that emerged from deploying and operating multiple generations of AI clusters in production. Omar Baldonado: Omar Baldonado leads the groups that develop/operate Meta's global data center networks. These networks support all of Meta’s AI models and the Meta family of apps (Meta AI, Facebook, Instagram, WhatsApp, Messenger). These groups have developed some of the largest AI clusters in the world (with gigawatt-scale clusters on the way), and they continually share their work through open-source libraries (e.g., TorchComms for PyTorch, FBOSS for switches) and in communities like the Open Compute Project. Omar has been in networking since the early 1990s. https://nanog.org/events/nanog-96/con...

ChatGPT, Gemini, Claude & Co erklärt: Wie Maschinen Sprache verstehen | Terra X Lesch & Co

Exposing The Dark Side of America's AI Data Center Explosion | View From Above | Business Insider

From Datacenter to AI Center, building the networks that build AI

Networking for AI

Inside the Modern Data Center! SuperClusters at Applied Digital

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

InfiniBand vs Ethernet vs Ultra Ethernet (UEC) - The AI Protocol War - AI DC Academy Ep 4

Redefining Netflix BGP Architecture

Lightmatter InterConnect 2026 | The Future of AI Runs on Light

Inside the World's Largest AI Supercluster xAI Colossus

Networking for AI Scaling, presented by Broadcom

I Can't BELIEVE They Let Me in Here!

Keynote: Beyond the Chip: The Unprecedented Infrastructure Demands of AI and HPC

IP Geofeeds: Trust, Accuracy, and Abuse

Decentralizing Software Defined Networking: The Hidden Complexities of SDN & What We Can Do

We Saw What AI Data Centers Don't Want You to See

🚗 BYD : The biggest SCAM of the car industry ?

Data Center Infrastructure Design Webinar l IEEE LAU Student Branch

AI Backend: Deploying SRv6 uSID and SONiC for Deterministic Load Balancing

