The AI Frontier: from Gemini 3 Deep Think distilling to Flash — Jeff Dean

From rewriting Google’s search stack in the early 2000s to reviving sparse trillion-parameter models and co-designing TPUs with frontier ML research, Jeff Dean has quietly shaped nearly every layer of the modern AI stack. As Chief AI Scientist at Google and a driving force behind Gemini, Jeff has lived through multiple scaling revolutions from CPUs and sharded indices to multimodal models that reason across text, video, and code. Jeff joins us to unpack what it really means to “own the Pareto frontier,” why distillation is the engine behind every Flash model breakthrough, how energy (in picojoules) not FLOPs is becoming the true bottleneck, what it was like leading the charge to unify all of Google’s AI teams, and why the next leap won’t come from bigger context windows alone, but from systems that give the illusion of attending to trillions of tokens. We discuss: • Jeff’s early neural net thesis in 1990: parallel training before it was cool, why he believed scaling would win decades early, and the “bigger model, more data, better results” mantra that held for 15 years • The evolution of Google Search: sharding, moving the entire index into memory in 2001, softening query semantics pre-LLMs, and why retrieval pipelines already resemble modern LLM systems • Pareto frontier strategy: why you need both frontier “Pro” models and low-latency “Flash” models, and how distillation lets smaller models surpass prior generations • Distillation deep dive: ensembles → compression → logits as soft supervision, and why you need the biggest model to make the smallest one good • Latency as a first-class objective: why 10–50x lower latency changes UX entirely, and how future reasoning workloads will demand 10,000 tokens/sec • Energy-based thinking: picojoules per bit, why moving data costs 1000x more than a multiply, batching through the lens of energy, and speculative decoding as amortization • TPU co-design: predicting ML workloads 2–6 years out, speculative hardware features, precision reduction, sparsity, and the constant feedback loop between model architecture and silicon • Sparse models and “outrageously large” networks: trillions of parameters with 1–5% activation, and why sparsity was always the right abstraction • Unified vs. specialized models: abandoning symbolic systems, why general multimodal models tend to dominate vertical silos, and when vertical fine-tuning still makes sense • Long context and the illusion of scale: beyond needle-in-a-haystack benchmarks toward systems that narrow trillions of tokens to 117 relevant documents • Personalized AI: attending to your emails, photos, and documents (with permission), and why retrieval + reasoning will unlock deeply personal assistants • Coding agents: 50 AI interns, crisp specifications as a new core skill, and how ultra-low latency will reshape human–agent collaboration • Why ideas still matter: transformers, sparsity, RL, hardware, systems — scaling wasn’t blind; the pieces had to multiply together Substack Article w/Show Notes: https://www.latent.space/p/jeffdean — Jeff Dean • LinkedIn: / jeff-dean-8b212555 • X: https://x.com/jeffdean Google • https://google.com • https://deepmind.google 00:00:00 Intro 00:01:31 Frontier vs Flash & Distillation Strategy 00:05:09 Distillation, RL & Flash Economic Advantage 00:07:35 Flash in Products + Importance of Latency 00:11:11 Benchmarks, Long Context & Real Use Cases 00:15:01 Attending to Trillions of Tokens & Multimodality 00:20:11 LLM Search & Google Search Evolution 00:24:09 Systems Design Principles + Latency Numbers 00:32:09 Energy, Batching & TPU Co-Design 00:42:21 Research Frontiers: Reliability & RL Challenges 00:46:27 Unified Models vs Symbolic Systems (IMO) 00:50:38 Knowledge vs Reasoning + Vertical/Modular Models 00:55:58 Multilingual + Low-Resource Language Insights 00:57:58 Vision-Language Representations Example 01:07:15 Gemini Origin Story + Organizational Memo 01:09:27 Coding with AI & Agent Interaction Style 01:14:26 Prompting Skills & Spec Design 01:19:54 Latency Predictions & Tokens/sec Vision 01:21:29 Future Predictions: Personal Models & Hardware 01:23:11 Closing

Demis Hassabis: Why AGI is Bigger than the Industrial Revolution & Where Are The Bottlenecks in AI

Demis Hassabis: Why AGI is Bigger than the Industrial Revolution & Where Are The Bottlenecks in AI

Is AI Hiding Its Full Power? With Geoffrey Hinton

Is AI Hiding Its Full Power? With Geoffrey Hinton

Decoding Google Gemini | Jeff Dean

Decoding Google Gemini | Jeff Dean

How Google DeepMind is researching the next Frontier of AI for Gemini — Raia Hadsell, VP of Research

How Google DeepMind is researching the next Frontier of AI for Gemini — Raia Hadsell, VP of Research

Do LLMs Understand? AI Pioneer Yann LeCun Spars with DeepMind’s Adam Brown.

Do LLMs Understand? AI Pioneer Yann LeCun Spars with DeepMind’s Adam Brown.

The Friendship That Made Google $1.8 Trillion

The Friendship That Made Google $1.8 Trillion

Marc Andreessen introspects on Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"

Marc Andreessen introspects on Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"

Demis Hassabis: Agents, AGI & The Next Big Scientific Breakthrough

Demis Hassabis: Agents, AGI & The Next Big Scientific Breakthrough

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

The Collaboration that Built Modern AI: Geoff Hinton & Jeff Dean in Conversation with Jordan Jacobs

The Collaboration that Built Modern AI: Geoff Hinton & Jeff Dean in Conversation with Jordan Jacobs

This is not the AI we were promised | The Royal Society

This is not the AI we were promised | The Royal Society

The Hardest Problem AI Ever Solved, with Google DeepMind CEO

The Hardest Problem AI Ever Solved, with Google DeepMind CEO

Ilya Sutskever – We're moving from the age of scaling to the age of research

Ilya Sutskever – We're moving from the age of scaling to the age of research

The Dangerous Illusion of AI Coding? - Jeremy Howard

The Dangerous Illusion of AI Coding? - Jeremy Howard

Big ideas begin here: Sergey Brin at Stanford

Big ideas begin here: Sergey Brin at Stanford

Distinguished Colloquium: Jeff Dean, February 10, 2026

Distinguished Colloquium: Jeff Dean, February 10, 2026

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Stanford AI Club: Jeff Dean on Important AI Trends

Stanford AI Club: Jeff Dean on Important AI Trends

Gemini co-leads on project origins and what's next

Gemini co-leads on project origins and what's next