An Introduction to Mechanistic Interpretability – Neel Nanda | IASEAI 2025

How can we reverse engineer what a neural network is doing? In this IASEAI ’25 session, An Introduction to Mechanistic Interpretability, Neel Nanda (Senior Research Scientist at Google DeepMind, formerly at Anthropic) provides an accessible overview of mechanistic interpretability—the study of how to understand the inner workings of neural networks. Nanda explores the progress so far, the limits of current approaches, and the field’s potential for improving AGI safety. He also examines how better interpretability tools could help evaluate the safety of current frontier systems and ensure transparency in future AI development. About IASEAI: https://www.iaseai.org Neel Nanda: https://www.neelnanda.io/ #NeelNanda #AISafety #Interpretability #MechanisticInterpretability #IASEAI