Doğaç Eldenk - Attention Drift – What speculative decoding models learn

00:00 Seminar Welcome 00:53 Talk Overview 01:22 Why Inference Is Hard 02:55 Speculative Decoding Basics 06:58 EAGLETree and MTP 08:59 Attention Sinks Primer 10:08 Attention Drift Discovery 15:14 Magnitude Mismatch Clues 17:51 Post Norm Fix 20:48 Training Time Tests 24:02 Gated Attention Experiments 29:19 Architectural Improvements 31:25 Q and A Practical Serving 34:29 How We Found It 35:52 Templates and Prompt Length 39:24 Long Context Sliding Window 44:56 Production Impact 45:58 Open Questions 47:29 Key Takeaways Speculative decoding speeds up LLM inference by drafting tokens with a small model, but drafters degrade sharply under template perturbation and long contexts. We identify a new phenomenon, attention drift: as the drafter generates within a speculation chain, its attention shifts away from the prompt onto its own recent tokens. We trace this to hidden-state magnitude accumulation across drafting steps and fix it with a post-norm architecture—EAGLE 3.1—that improves resilience and performance. Bio: Doğaç is a Master's student in Northwestern University's Computer Science program, joining Fal as a Machine Learning Engineer. His work focuses on inference acceleration, from speculative decoding to agentic GPU kernel optimization and discovery. This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Harsha Nelaturu and Andrej Jovanović, Leads of our ML Systems and Theory group for their dedication in organizing this event. If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker. Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommuni....

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Bingyi Cao & Koert Chen - TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text

Bingyi Cao & Koert Chen - TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Harness Engineering Masterclass: Technical Deep Dive on how to build Agentic Systems

Harness Engineering Masterclass: Technical Deep Dive on how to build Agentic Systems

How To Think SO CLEARLY People Assume You're A Genius

How To Think SO CLEARLY People Assume You're A Genius

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

ASMR Best Triggers For Sleep Collection (No Talking) 3 Hours of Tapping & Scratching

ASMR Best Triggers For Sleep Collection (No Talking) 3 Hours of Tapping & Scratching

The Future of AI Agents with Andrew Ng | Interrupt 26

The Future of AI Agents with Andrew Ng | Interrupt 26

Sonia Joseph - Interpreting Physics in Video World Models

Sonia Joseph - Interpreting Physics in Video World Models

Why Inference is hard..

Why Inference is hard..

The French Do Not Care About Work

The French Do Not Care About Work

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Debjyoti Paul - Softmax, Backprop, and the Autograd The Hidden Machinery Behind Deep Learning

Debjyoti Paul - Softmax, Backprop, and the Autograd The Hidden Machinery Behind Deep Learning

The Tiny Idea That Lets Anyone Fine-Tune AI

The Tiny Idea That Lets Anyone Fine-Tune AI

Something is jamming GPS over Europe. Here's what we found

Something is jamming GPS over Europe. Here's what we found

Attacking AI - Jason Haddix - NDC Security 2026

Attacking AI - Jason Haddix - NDC Security 2026