Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar

Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar, Google The rapidly evolving landscape of Machine Learning and Large Language Models demands efficient scalable ways to run distributed workloads to train, fine-tune and serve models. Ray is an Open Source framework that simplifies distributed machine learning, and Kubernetes streamlines deployment. In this introductory talk, we'll uncover how to combine Ray and Kubernetes for your ML projects. You will learn about: - Basic Ray concepts (actors, tasks) and their relevance to ML - Setting up a simple Ray cluster within Kubernetes - Running your first distributed ML training job

Ray + Kubernetes: The Distributed OS for AI/ML | Ray on the Road – NYC 2025
▶︎

Ray + Kubernetes: The Distributed OS for AI/ML | Ray on the Road – NYC 2025

Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines... M. Kaushik, S.K. Merla
▶︎

Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines... M. Kaushik, S.K. Merla

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kub... D. Gray
▶︎

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kub... D. Gray

Understanding Kubernetes Networking in 30 Minutes - Ricardo Katz & James Strong
▶︎

Understanding Kubernetes Networking in 30 Minutes - Ricardo Katz & James Strong

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu
▶︎

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu

Introduction to Distributed Computing with the Ray Framework
▶︎

Introduction to Distributed Computing with the Ray Framework

Training and Serving LLM’s on Kubernetes: A beginner’s guide - Abdel Sghiouar
▶︎

Training and Serving LLM’s on Kubernetes: A beginner’s guide - Abdel Sghiouar

KubeRay: A Ray cluster management solution on Kubernetes
▶︎

KubeRay: A Ray cluster management solution on Kubernetes

From Spark to Ray: An Exabyte-Scale Production Migration Case Study
▶︎

From Spark to Ray: An Exabyte-Scale Production Migration Case Study

Introduction to Large Language Models (LLM) on Kubernetes - Alexander Schaber
▶︎

Introduction to Large Language Models (LLM) on Kubernetes - Alexander Schaber

Run A Local LLM Across Multiple Computers! (vLLM Distributed Inference)
▶︎

Run A Local LLM Across Multiple Computers! (vLLM Distributed Inference)

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan
▶︎

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

The open source AI compute tech stack: Kubernetes + Ray + PyTorch + vLLM
▶︎

The open source AI compute tech stack: Kubernetes + Ray + PyTorch + vLLM

Pinterest's ML Evolution: Distributed Training with Ray | Ray Summit 2024
▶︎

Pinterest's ML Evolution: Distributed Training with Ray | Ray Summit 2024

Distributed ML Talk @ UC Berkeley
▶︎

Distributed ML Talk @ UC Berkeley

Democratizing AI Model Training on Kubernetes with Kubeflow TrainJob and... A. Velichkevich, Y. Iwai
▶︎

Democratizing AI Model Training on Kubernetes with Kubeflow TrainJob and... A. Velichkevich, Y. Iwai

Beginner's Guide to Ray! Ray Explained
▶︎

Beginner's Guide to Ray! Ray Explained

Kubernetes Zero to Hero: The Complete Beginner’s Guide (2025 Edition)
▶︎

Kubernetes Zero to Hero: The Complete Beginner’s Guide (2025 Edition)

Best Practices for Productionizing Distributed Training with Ray Train
▶︎

Best Practices for Productionizing Distributed Training with Ray Train

Building Massive-Scale Generative AI Services with Kubernetes and Open Source - John McBride
▶︎

Building Massive-Scale Generative AI Services with Kubernetes and Open Source - John McBride