NEURAL NETWORKS ARE WEIRD! - Neel Nanda (DeepMind)

SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Neel Nanda leads the mechanistic interpretability team at Google DeepMind. At 26, he's become one of the most prominent researchers working on the question of what's actually going on inside neural networks -- systems that can win IMO medals and write complex software, but which nobody actually designed or understands. This nearly four-hour conversation is a deep technical dive into the field. Nanda explains why machine learning is fundamentally weird: we produce artifacts that do impressive things, but unlike conventional software, no one wrote the code or planned the architecture. His team's goal is reverse-engineering these systems by finding the internal structures and algorithms that emerge during training. The discussion covers the mechanics of sparse autoencoders at length -- how they decompose model activations into interpretable feature vectors, the mathematical foundations (ReLU vs TopK activation functions), scaling laws for feature learning, and the engineering challenges of running them at the scale of frontier models. Nanda walks through the Golden Gate Claude experiment (amplifying a single feature to make Claude obsessed with the Golden Gate Bridge), induction heads (the circuits responsible for in-context learning), and activation patching as a causal intervention technique. On AI safety, Nanda is pragmatic. He argues that mechanistic interpretability gives us genuine empirical evidence about questions that are otherwise stuck in philosophical debate -- do models have goals? Do they deceive? He also discusses the limitations: sparse autoencoders haven't yet demonstrated capabilities beyond what fine-tuning already achieves, and at sufficient model complexity, models could potentially facade interpretability measurements. The conversation covers his path from pure maths at Cambridge through Anthropic to DeepMind, and why he thinks hands-on coding matters more than reading papers for new researchers entering the field. --- REFERENCES: person: [00:00:00] Neel Nanda - Personal Website https://www.neelnanda.io/ tool: [00:35:00] TransformerLens https://github.com/TransformerLensOrg... paper: [01:00:31] A Mathematical Framework for Transformer Circuits https://transformer-circuits.pub/2021... [01:01:40] In-context Learning and Induction Heads https://transformer-circuits.pub/2022... [01:21:06] Scaling Monosemanticity https://transformer-circuits.pub/2024... [01:33:27] Refusal in Language Models Is Mediated by a Single Direction https://arxiv.org/abs/2406.11717 --- LINKS: Full Transcript: https://app.rescript.info/share/acb41... Download PDF transcript: https://app.rescript.info/api/public/... NEEL NANDA: https://www.neelnanda.io/ https://scholar.google.com/citations?... https://x.com/NeelNanda5

We Can Monitor AI’s Thoughts… For Now | Google DeepMind's Neel Nanda

We Can Monitor AI’s Thoughts… For Now | Google DeepMind's Neel Nanda

The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein

The AI Progress Chart Everyone Is Misreading — Beth Barnes & David Rein

AI Tools, Not Gods

AI Tools, Not Gods

Neural Networks Are Elastic Origami! [Prof. Randall Balestriero]

Neural Networks Are Elastic Origami! [Prof. Randall Balestriero]

JEPA w/ Yann LeCun: Hype or the Future of AI?

JEPA w/ Yann LeCun: Hype or the Future of AI?

Symbolic World Models - Top Piriyakulkij

Symbolic World Models - Top Piriyakulkij

The Real Reason Huge AI Models Actually Work [Prof. Andrew Wilson]

The Real Reason Huge AI Models Actually Work [Prof. Andrew Wilson]

Something is jamming GPS over Europe. Here's what we found

Something is jamming GPS over Europe. Here's what we found

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

Godfather of AI: How To Make Safe Superintelligent AI – Yoshua Bengio

Godfather of AI: How To Make Safe Superintelligent AI – Yoshua Bengio

Open Models Coding Essentials – Running LLMs Locally and in the Cloud Course

Open Models Coding Essentials – Running LLMs Locally and in the Cloud Course

Yann LeCun's $1B Bet Against LLMs

Yann LeCun's $1B Bet Against LLMs

AI Bubble: How AI's push towards IPOs became a death drive | Ed Zitron

AI Bubble: How AI's push towards IPOs became a death drive | Ed Zitron

The Uncomfortable Truth About AI “Reasoning” | World Science Festival

The Uncomfortable Truth About AI “Reasoning” | World Science Festival

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

Deep Dive into LLMs like ChatGPT

Deep Dive into LLMs like ChatGPT

Inference, Diffusion, World Models, and More | YC Paper Club

Inference, Diffusion, World Models, and More | YC Paper Club

The Man Who Invented Modern AI (Before Everyone Else) — Jürgen Schmidhuber

The Man Who Invented Modern AI (Before Everyone Else) — Jürgen Schmidhuber

Interpretability: Understanding how AI models think

Interpretability: Understanding how AI models think