Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar
Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar, Google The rapidly evolving landscape of Machine Learning and Large Language Models demands efficient scalable ways to run distributed workloads to train, fine-tune and serve models. Ray is an Open Source framework that simplifies distributed machine learning, and Kubernetes streamlines deployment. In this introductory talk, we'll uncover how to combine Ray and Kubernetes for your ML projects. You will learn about: - Basic Ray concepts (actors, tasks) and their relevance to ML - Setting up a simple Ray cluster within Kubernetes - Running your first distributed ML training job

▶︎
Ray + Kubernetes: The Distributed OS for AI/ML | Ray on the Road – NYC 2025

▶︎
Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines... M. Kaushik, S.K. Merla

▶︎
Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kub... D. Gray

▶︎
Understanding Kubernetes Networking in 30 Minutes - Ricardo Katz & James Strong

▶︎
Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu

▶︎
Introduction to Distributed Computing with the Ray Framework

▶︎
Training and Serving LLM’s on Kubernetes: A beginner’s guide - Abdel Sghiouar

▶︎
KubeRay: A Ray cluster management solution on Kubernetes

▶︎
From Spark to Ray: An Exabyte-Scale Production Migration Case Study

▶︎
Introduction to Large Language Models (LLM) on Kubernetes - Alexander Schaber

▶︎
Run A Local LLM Across Multiple Computers! (vLLM Distributed Inference)

▶︎
Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

▶︎
The open source AI compute tech stack: Kubernetes + Ray + PyTorch + vLLM

▶︎
Pinterest's ML Evolution: Distributed Training with Ray | Ray Summit 2024

▶︎
Distributed ML Talk @ UC Berkeley

▶︎
Democratizing AI Model Training on Kubernetes with Kubeflow TrainJob and... A. Velichkevich, Y. Iwai

▶︎
Beginner's Guide to Ray! Ray Explained

▶︎
Kubernetes Zero to Hero: The Complete Beginner’s Guide (2025 Edition)

▶︎
Best Practices for Productionizing Distributed Training with Ray Train

▶︎
