A friendly introduction to distributed training (ML Tech Talks)
Google Cloud Developer Advocate Nikita Namjoshi introduces how distributed training models can dramatically reduce machine learning training times, explains how to make use of multiple GPUs with Data Parallelism vs Model Parallelism, and explores Synchronous vs Asynchronous Data Parallelism. Mesh TensorFlow → https://goo.gle/3sFPrHw Distributed Training with Keras tutorial → https://goo.gle/3FE6QEa GCP Reduction Server Blog → https://goo.gle/3EEznYB Multi Worker Mirrored Strategy tutorial → https://goo.gle/3JkQT7Y Parameter Server Strategy tutorial → https://goo.gle/2Zz3UrW Distributed training on GCP Demo → https://goo.gle/3pABNDE Chapters: 0:00 - Introduction 00:17 - Agenda 00:37 - Why distributed training? 1:49 - Data Parallelism vs Model Parallelism 6:05 - Synchronous Data Parallelism 18:20 - Asynchronous Data Parallelism 23:41 Thank you for watching Watch more ML Tech Talks → https://goo.gle/ml-tech-talks Subscribe to TensorFlow → https://goo.gle/TensorFlow #TensorFlow #MachineLearning #ML product: TensorFlow - General;

How to make TensorFlow models run faster on GPUs

Introduction to Explainable AI (ML Tech Talks)

Intro to graph neural networks (ML Tech Talks)

Deep Learning at Scale with Horovod feat. Travis Addair | Stanford MLSys Seminar Episode 10

TensorFlow from the ground up (ML Tech Talks)

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

Kafka Crash Course - Hands-On Project

Machine Learning for Everybody – Full Course

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Lec 01. Introduction to Deep Learning

Inside YC's AI Playbook

RAG Crash Course for Beginners

AWS re:Invent 2020: Train billion-parameter models with model parallelism on Amazon SageMaker

Distributed ML Talk @ UC Berkeley

Intro to Deep Learning (ML Tech Talks)

Cryptolets Speaker Series 02: The Cryptolets Program with Applications to Point Addition

The Modern Stack for ML Infrastructure | Outerbounds

Transfer learning and Transformer models (ML Tech Talks)

Stanford CS231N | Spring 2025 | Lecture 11: Large Scale Distributed Training

