CMU Advanced NLP Spring 2025 (16): Parallelism and Scaling

This lecture (by Sean Welleck) for CMU CS 11-711, Advanced NLP covers: Basics of training on one GPU Parallelization on multiple GPUs (e.g., data, tensor, pipeline parallel) Combining and comparing strategies Content (including figures) based on The Ultra-Scale Playbook: https://huggingface.co/spaces/nanotro...