ML Foundations (prerequisites) for Post-Training | RLHF Book Course, Lecture 0

In this video I try to cover a bunch of math, LLM training fundamentals, and probability concepts that come up again and again in post-training content (and this book). We cover things like the role of mid-training, definitions of KL, entropy & cross-entropy, getting LM probabilities from a sequence, etc. Thanks to everyone who nudged me to make this video, the slides were a fun experiment with GLM-5.2 (more on that model here: https://www.interconnects.ai/p/glm-52...) Extra learning resources: https://rlhfbook.com/course#extra-res... "Lecture 0" added after the course was well underway :) Welcome to The RLHF Book & Post-Training Course with Nathan Lambert. Ask questions and I'll answer them in the next roundup video! Slides for this lecture are here: https://rlhfbook.com/teach/course/lec... Chapters: 00:00 Introduction & Course Prerequisites 01:37 Language Models Overview 02:47 The LM Head 04:29 Softmax & Log-Probabilities 06:13 Anatomy of an LM Training Example 06:37 Computing LLM Probabilities (+Phoebe the Dog) 09:52 Three Common Masks in Post-Training 11:03 A Small Decoding Review 12:14 Training an LM: Cross-Entropy 13:23 Optimization & Fine-Tuning 13:55 Pretraining to Midtraining to SFT Pipeline 15:25 Probability Essentials: KL Divergence & Entropy 19:36 Sigmoid & Pairwise Likelihood 20:29 Reinforcement Learning Framing (MDP) 22:28 Transitioning Tools into Post-Training 23:12 Recommended Resources & Wrap-Up All resources will be available at https://rlhfbook.com/ Order a copy of the book (physical recommended) on Manning.com: https://hubs.la/Q03Tc3dc0 Order a copy on Amazon: https://amzn.to/4cwCDJQ With specific course resources at https://rlhfbook.com/course (recording links, slides in PDF and native form, etc.) And code at https://rlhfbook.com/code Get more information on Nathan at http://natolambert.com/ and stay up to date with his work on Interconnects https://www.interconnects.ai/ Course YouTube playlist: • Welcome to The RLHF Book & Post-Training C... Join the book's Discord Community: / discord Nathan is on… X: / natolambert LinkedIn: / natolambert GitHub: https://github.com/natolambert BlueSky: https://bsky.app/profile/natolambert.... Threads: https://www.threads.com/@natolambert Substack: https://substack.com/@natolambert Slides are built with Colloquium: https://github.com/natolambert/colloq... Thank you to my many collaborators who helped me learn this information I get to share with the world!

RLHF and Post-training Overview | RLHF & Post-Training Book Course, Lecture 1

RLHF and Post-training Overview | RLHF & Post-Training Book Course, Lecture 1

On-Policy Distillation & Using Synthetic Data in Post-Training | RLHF Book Course, Lecture 7

On-Policy Distillation & Using Synthetic Data in Post-Training | RLHF Book Course, Lecture 7

Young Men in Expensive Cars

Young Men in Expensive Cars

AI Software Development Is Near-Impossible

AI Software Development Is Near-Impossible

RLHF Foundations, IFT, Reward Modeling, Rejection Sampling | RLHF & Post-Training Course Lecture 2

RLHF Foundations, IFT, Reward Modeling, Rejection Sampling | RLHF & Post-Training Course Lecture 2

What Nobody Tells You About Being a Quant

What Nobody Tells You About Being a Quant

Reinventing Entropy | Compression is Intelligence Part 1

Reinventing Entropy | Compression is Intelligence Part 1

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Understanding Policy Gradient Algorithms for RL on LLMs | RLHF & Post-training Course Lecture 3

Understanding Policy Gradient Algorithms for RL on LLMs | RLHF & Post-training Course Lecture 3

Designing Math ft. Grant Sanderson (3Blue1Brown) I Config 2026

Designing Math ft. Grant Sanderson (3Blue1Brown) I Config 2026

How To Think SO CLEARLY People Assume You're A Genius

How To Think SO CLEARLY People Assume You're A Genius

Direct Preference Optimization (DPO) and Friends | RLHF & Post-training Course, Lecture 6

Direct Preference Optimization (DPO) and Friends | RLHF & Post-training Course, Lecture 6

"Software Fundamentals Matter More Than Ever" — Matt Pocock

"Software Fundamentals Matter More Than Ever" — Matt Pocock

Terence Tao: Nobody Understands Why AI Actually Works

Terence Tao: Nobody Understands Why AI Actually Works

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

The Strange Math That Predicts (Almost) Anything

The Strange Math That Predicts (Almost) Anything

Is RAG Still Needed? Choosing the Best Approach for LLMs

Is RAG Still Needed? Choosing the Best Approach for LLMs

The Rise of Reasoning Models | RLHF & Post-training Course Lecture 5

The Rise of Reasoning Models | RLHF & Post-training Course Lecture 5

A Philosophical Look at System Dynamics

A Philosophical Look at System Dynamics