Hacking LLMs: An Introduction to Mechanistic Interpretability — Jenny Vega

[EuroPython 2025 — South Hall 2B on 2025-07-17] 🎤 Hacking LLMs: An Introduction to Mechanistic Interpretability by Jenny Vega 🔗 https://ep2025.europython.eu/session/... 📝 Abstract: Large Language Models (LLMs) have become transformative tools, reshaping industries and research alike. Yet, while their outputs can feel like magic, their inner workings remain opaque to most users. How do these models "think"? Can we untangle the layers of their reasoning processes? Step into the cutting-edge field of Mechanistic Interpretability, where we aim to decode the black box of LLMs into understandable, human-readable components. In this session, we will explore how researchers and practitioners dissect neural networks, uncovering the mechanisms behind their behavior. We will start with the foundational concepts, what Mechanistic Interpretability is and why it matters, before diving into practical tools and techniques. We will emphasize why this field is essential: from ensuring models behave safely and ethically to optimizing their performance and fostering trust in AI systems. Attendees will leave with a conceptual toolkit for interpreting LLMs and practical takeaways on how to start applying these insights in their own work using Python libraries like PyTorch, Transformers, and interpretability-specific tools. This talk assumes familiarity with AI fundamentals but introduces advanced concepts with approachable explanations. Whether you're a researcher, developer, or curious enthusiast, you’ll gain actionable insights and inspiration to engage with one of the most exciting frontiers in AI. No specialized hardware or prerequisites are required, just bring your curiosity! --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/...

How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat

How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat

Terence Tao: Nobody Understands Why AI Actually Works

Terence Tao: Nobody Understands Why AI Actually Works

Train Your Brain to Never Forget (5 Feynman Habits)

Train Your Brain to Never Forget (5 Feynman Habits)

Nicholas Carlini - Black-hat LLMs | [un]prompted 2026

Nicholas Carlini - Black-hat LLMs | [un]prompted 2026

EuroPython 2025 — CPython Core Development Panel

EuroPython 2025 — CPython Core Development Panel

Mechanistic Interpretability explained | Chris Olah and Lex Fridman

Mechanistic Interpretability explained | Chris Olah and Lex Fridman

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Mechanistic Interpretability for NLP: One-stop Guide for Everything you Need to Know

Mechanistic Interpretability for NLP: One-stop Guide for Everything you Need to Know

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Chris Olah - Looking Inside Neural Networks with Mechanistic Interpretability

Chris Olah - Looking Inside Neural Networks with Mechanistic Interpretability

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

The Dark Matter of AI [Mechanistic Interpretability]

The Dark Matter of AI [Mechanistic Interpretability]

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

Causal Mechanistic Interpretability (Stanford lecture 1) - Atticus Geiger

Causal Mechanistic Interpretability (Stanford lecture 1) - Atticus Geiger

Is the AI Boom About to COLLAPSE?

Is the AI Boom About to COLLAPSE?

This is not the AI we were promised | The Royal Society

This is not the AI we were promised | The Royal Society

Training Sand to Think: Artificial General Intelligence & Future of Physics

Training Sand to Think: Artificial General Intelligence & Future of Physics

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

Neel Nanda – Mechanistic Interpretability: A Whirlwind Tour

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou