The Rise of Reasoning Models | RLHF & Post-training Course Lecture 5

Welcome to The RLHF Book & Post-Training Course with Nathan Lambert. This lecture covers Chapter 7 on Reasoning & Inference-time Scaling Ask questions and I'll answer them in the next roundup video! Slides for this lecture: https://rlhfbook.com/teach/course/lec... 00:00 - Introduction & Core Concepts 10:41 - The 2025 Cambrian Explosion of Reasoning Models 34:54 - Common Implementation Patterns 42:42 - Open Questions & Conclusion All resources will be available at https://rlhfbook.com/ Order a copy of the book (physical recommended) on Manning.com: https://hubs.la/Q03Tc3dc0 Order a copy on Amazon: https://amzn.to/4cwCDJQ With specific course resources at https://rlhfbook.com/course (recording links, slides in PDF and native form, etc.) And code at https://rlhfbook.com/code Get more information on Nathan at http://natolambert.com/ and stay up to date with his work on Interconnects https://www.interconnects.ai/ Course YouTube playlist: • Welcome to The RLHF Book & Post-Training C... Join the book's Discord Community: / discord Nathan is on… X: / natolambert LinkedIn: / natolambert GitHub: https://github.com/natolambert BlueSky: https://bsky.app/profile/natolambert.... Threads: https://www.threads.com/@natolambert Substack: https://substack.com/@natolambert Slides are built with Colloquium: https://github.com/natolambert/colloq... Thank you to my many collaborators who helped me learn this information I get to share with the world!

On-Policy Distillation & Using Synthetic Data in Post-Training | RLHF Book Course, Lecture 7

On-Policy Distillation & Using Synthetic Data in Post-Training | RLHF Book Course, Lecture 7

Direct Preference Optimization (DPO) and Friends | RLHF & Post-training Course, Lecture 6

Direct Preference Optimization (DPO) and Friends | RLHF & Post-training Course, Lecture 6

Joscha Bach on artificial general intelligence and deep learning

Joscha Bach on artificial general intelligence and deep learning

ML Foundations (prerequisites) for Post-Training | RLHF Book Course, Lecture 0

ML Foundations (prerequisites) for Post-Training | RLHF Book Course, Lecture 0

RLHF and Post-training Overview | RLHF & Post-Training Book Course, Lecture 1

RLHF and Post-training Overview | RLHF & Post-Training Book Course, Lecture 1

Understanding Policy Gradient Algorithms for RL on LLMs | RLHF & Post-training Course Lecture 3

Understanding Policy Gradient Algorithms for RL on LLMs | RLHF & Post-training Course Lecture 3

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Q&A 1: Teacher Models, PPO Implementation Questions & More | RLHF & Post-training Course

Q&A 1: Teacher Models, PPO Implementation Questions & More | RLHF & Post-training Course

Implementing RL Algorithms for LLMs | RLHF & Post-training Course, Lecture 4

Implementing RL Algorithms for LLMs | RLHF & Post-training Course, Lecture 4

The Uncomfortable Truth About AI “Reasoning” | World Science Festival

The Uncomfortable Truth About AI “Reasoning” | World Science Festival

Microbiome expert: How to reset your gut overnight | Tim Spector

Microbiome expert: How to reset your gut overnight | Tim Spector

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

ASMR Best Triggers For Sleep Collection (No Talking) 3 Hours of Tapping & Scratching

ASMR Best Triggers For Sleep Collection (No Talking) 3 Hours of Tapping & Scratching

The Big Short (2015): The Jenga Scene – Explaining the Financial Collapse

The Big Short (2015): The Jenga Scene – Explaining the Financial Collapse

Tom Hanks' HILARIOUS Harvard Speech Leaves Audience in Splits: “I Make a Good Living...” | REPLUG

Tom Hanks' HILARIOUS Harvard Speech Leaves Audience in Splits: “I Make a Good Living...” | REPLUG

Terence Tao: Nobody Understands Why AI Actually Works

Terence Tao: Nobody Understands Why AI Actually Works

AI Was Never About Helping You | Cory Doctorow

AI Was Never About Helping You | Cory Doctorow

The French Do Not Care About Work

The French Do Not Care About Work

How To Become Dangerously Self-Educated (with AI)

How To Become Dangerously Self-Educated (with AI)