CMU Advanced NLP 2024 (5): Transformers

This lecture (by Graham Neubig) for CMU CS 11-711, Advanced NLP (Spring 2024) covers: Transformer Architecture Multi-Head Attention Positional Encodings Layer Normalization Optimizers and Training LLaMa Architecture Class Site: https://phontron.com/class/anlp2024/

CMU Advanced NLP 2024 (6): Generation Algorithms

CMU Advanced NLP 2024 (6): Generation Algorithms

CMU Advanced NLP 2024 (7): Prompting

CMU Advanced NLP 2024 (7): Prompting

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention

MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

CMU Advanced NLP Fall 2024 (6): Instruction Tuning

CMU Advanced NLP Fall 2024 (6): Instruction Tuning

NLP Demystified 15: Transformers From Scratch + Pre-training and Transfer Learning With BERT/GPT

NLP Demystified 15: Transformers From Scratch + Pre-training and Transfer Learning With BERT/GPT

Lecture 3.1 - Multimodal Representation Fusion (CMU Multimodal Machine Learning, Fall 2023)

Lecture 3.1 - Multimodal Representation Fusion (CMU Multimodal Machine Learning, Fall 2023)

CMU LLM Inference (10): Incorporating Tools

CMU LLM Inference (10): Incorporating Tools

CMU Advanced NLP Fall 2024 (14): Ensembling and Mixture of Experts

CMU Advanced NLP Fall 2024 (14): Ensembling and Mixture of Experts

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

CMU Advanced NLP Fall 2024 (7): Prompting and Complex Reasoning

CMU Advanced NLP Fall 2024 (7): Prompting and Complex Reasoning

CMU Advanced NLP Spring 2025 (16): Parallelism and Scaling

CMU Advanced NLP Spring 2025 (16): Parallelism and Scaling

RI Seminar: Max Simchowitz: Generative Control, Action Chunking, and Moravec’s Paradox

RI Seminar: Max Simchowitz: Generative Control, Action Chunking, and Moravec’s Paradox

Let's build the GPT Tokenizer

Let's build the GPT Tokenizer

CMU Advanced NLP Fall 2024 (8): Reinforcement Learning and Human Feedback

CMU Advanced NLP Fall 2024 (8): Reinforcement Learning and Human Feedback

Attention in transformers, step-by-step | Deep Learning Chapter 6

Attention in transformers, step-by-step | Deep Learning Chapter 6

Power BI DAX Tutorial for Beginners (2025): Master DAX in ONE Course!

Power BI DAX Tutorial for Beginners (2025): Master DAX in ONE Course!

Geoffrey Hinton | Will digital intelligence replace biological intelligence?

Geoffrey Hinton | Will digital intelligence replace biological intelligence?

CMU Advanced NLP Fall 2024 (20): Multilingual NLP

CMU Advanced NLP Fall 2024 (20): Multilingual NLP