How xAI Scales Image & Video Processing with Ray | Ray Summit 2025
At Ray Summit 2025, Zhibei Ma and Kai-Hsun Chen from xAI share how the company is building a high-performance data processing stack to power some of the world’s most advanced multimodal AI models. They explain why multimodal data is central to xAI’s mission and how meeting the extreme demands of large-scale training led them to develop a distributed data pipeline built on Ray Core and KubeRay. This system enables efficient processing of massive image and video datasets with linear scalability and robust fault tolerance in production environments. In this talk, they present the architecture of xAI’s Ray-based data pipeline and the strategies used to achieve high availability and operational simplicity at supercluster scale. If you’re working on multimodal AI, large-scale data pipelines, or distributed training infrastructure, this session offers deep technical insight from real-world deployment. Liked this video? Check out other Ray Summit breakout session recordings Subscribe to our YouTube channel to stay up-to-date on the future of AI! / anyscale 🔗 Connect with us: LinkedIn: / joinanyscale X: https://x.com/anyscalecompute Website: https://www.anyscale.com/

LMCache Office Hour 2025 01 08

Building Efficient Sovereign AI Models for Europe With NVIDIA Nemotron

How BMW Scales Automotive AI Workloads with the Ray Framework | Ray Summit 2025

Secure & Scalable AI on Ray + Kubernetes: Google’s Decoupled Agent Pattern | Ray Summit 2025

NVIDIA NeMo Curator: Scaling Multi-Modal Data Curation Workflows | Ray Summit 2025

Multimodal data: Architecting pipelines that don’t break at scale

Moving from LLM Gateways to a Single Agent Origin

But how do AI images and videos actually work? | Guest video by Welch Labs

NVIDIA’s Framework for Scalable Data Curation | Ray Summit 2025

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Webinar: Getting Started with Distributed Training at Scale

MIT 6.S191: Secrets of Massively Parallel Training

How Coinbase Uses Ray, vLLM & LiteLLM to Power Secure LLM Services | Ray Summit 2025

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

How Ray Data Powers Scalable AI Workloads | Ray Summit 2025

The Insane Genius of a Formula 1 Gearbox

From Prototype to Production: Securely Accelerating Physical AI with Vision-Language-Action Models

The World's Most Important Machine

Why Ray Became a Distributed Computing Engine for Modern AI

