Science of Misalignment

If a future model were to be dangerously misaligned, could we tell? If this kind of research sounds interesting to you, apply to do research with me in MATS! Due 23 Dec tinyurl.com/neel-mats-app 00:00:00 The Problem with Viral Demos 00:06:49 Hunting for "Eval Awareness" 00:17:00 Debunking the Shutdown Demo 00:24:00 Why Do Models Blackmail 00:31:33 A New Tool: The Resilience Score 00:32:30 The Science of Misalignment 00:35:45 How to Convince Skeptics? 00:47:00 The Future of AI Psychology

How Reasoning Models Break Mechanistic Interpretability Techniques

How Reasoning Models Break Mechanistic Interpretability Techniques

How Will Mech Interp Help Make AGI Safe?

How Will Mech Interp Help Make AGI Safe?

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Introduction to Mechanistic Interpretability with David Bau

Introduction to Mechanistic Interpretability with David Bau

Can Interpretability Control Model Training?

Can Interpretability Control Model Training?

What Happened With Sparse Autoencoders?

What Happened With Sparse Autoencoders?

The Strange Math That Predicts (Almost) Anything

The Strange Math That Predicts (Almost) Anything

Building the PERFECT Linux PC with Linus Torvalds

Building the PERFECT Linux PC with Linus Torvalds

How To Interpret Chain Of Thought: A Walkthrough

How To Interpret Chain Of Thought: A Walkthrough

How AI Cracked the Protein Folding Code and Won a Nobel Prize

How AI Cracked the Protein Folding Code and Won a Nobel Prize

What Matters Right Now In Mechanistic Interpretability?

What Matters Right Now In Mechanistic Interpretability?

Something is jamming GPS over Europe. Here's what we found

Something is jamming GPS over Europe. Here's what we found

Train Your Brain to Never Forget (5 Feynman Habits)

Train Your Brain to Never Forget (5 Feynman Habits)

Training Sand to Think: Artificial General Intelligence & Future of Physics

Training Sand to Think: Artificial General Intelligence & Future of Physics

The Story of Mech Interp

The Story of Mech Interp

How To Think About Thinking Models

How To Think About Thinking Models

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)

Creating Models Worth Interpreting

Creating Models Worth Interpreting

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service