Gemma 4 Deep Dive — Cassidy Hardin, Researcher, Google DeepMind
Open models are getting smaller, faster, and far more capable. In this talk, Cassidy Hardin walks through the latest advances in the Gemma family, with a focus on Gemma 4 and what it enables for developers building on-device and open-weight AI systems. She covers the architecture behind Gemma’s dense, effective, and mixture-of-experts models, including improvements to attention, multimodal support for text, vision, and audio, and the design decisions that make strong reasoning, coding, and agentic workflows possible at practical sizes. Speaker info: / cassidyhardin Timestamps: 00:00:28 - Introduction to the Gemma 4 model family and its four size categories 00:01:54 - Shift to Apache 2.0 licensing for developer accessibility 00:02:25 - Deep dive into the 31B dense reasoning and 26B mixture-of-experts (MoE) models 00:03:30 - Overview of on-device effective models (2B and 4B) with multimodal support 00:04:21 - Architectural updates: interleaved local/global attention and grouped query attention 00:06:51 - Explanation of the new MoE architecture (128 experts, 8 active) 00:07:44 - Implementation of Per Layer Embeddings (PLE) to optimize on-device memory 00:11:06 - Multimodal advances: variable aspect ratios and resolutions for vision encoders 00:16:31 - Audio processing enhancements via conformer architecture and audio tokenizers 00:18:07 - Getting started: self-hosting (Hugging Face, Ollama) and cloud deployment (Vertex AI)

Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Pitchfest: New York City (NYC Office of Technology and Innovation)

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

Demis Hassabis On What AI Will Do Next

Why Google Just Gave Away Gemma 4 for Free

Master Gemma 4 in 20 Minutes

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

Qwen 3.6 vs Gemma 4: I Built the Same App With Both Locally

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

Self-Attention Explained: How Transformers Actually Work (Full Visual Breakdown)

The real reason Google gave away Gemma 4

Demis Hassabis: Why AGI is Bigger than the Industrial Revolution & Where Are The Bottlenecks in AI

The Best Local Agentic Coding Workflow (Complete Guide)

How Google DeepMind is researching the next Frontier of AI for Gemini — Raia Hadsell, VP of Research

Google's New TPU Quietly Ends the GPU Era?

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

These 6 Wind Turbines Are The Future Of Home Power

I gave a Gemma 4 AI agent a sandbox and it taught itself physics | The Agent Factory Podcast

