DeepSeek Gave LLMs a Real Memory (It's Not RAG)

DeepSeek's engram introduces a new way to retrieve knowledge through scalable lookups. This boosts LLMs across all tasks (including reasoning tasks) by freeing up attention and MoE layers from the need to reconstruct facts in static patterns. In this video, let's explore how and why Engram works. 00:00 Attention 01:56 How facts are stored in LLM (FFN/MoE) 06:27 Retrieving knowledge via lookup 07:32 Hashing 10:47 Multi-head hashing 11:56 Context-aware gating 16:06 Multi-branch architecture (mHC) 16:53 Integrating Engram into a Transformer 18:01 Sparsity allocation (Engram vs MoE) 20:16 Performance on benchmark tasks 22:08 Why does Engram improve LLM reasoning? 23:45 Where should we place the Engram? 25:41 Does the Engram model really make the model deeper? 27:05 Embedding scaling and the future of LLMs References: [Engram] https://arxiv.org/abs/2601.07372 [Layer Embeddings] https://developers.googleblog.com/en/... [DeepEmbed] https://www.rwkv.com/ [SuperBPE] https://arxiv.org/abs/2503.13423 [SCONE] https://arxiv.org/abs/2502.01637 [OverEncoding] https://arxiv.org/abs/2501.16975 [Byte Latent Transformer] https://arxiv.org/abs/2412.09871 [LongCat-Flash-Lite] https://arxiv.org/abs/2601.21204 [Large Lookup Layers] https://arxiv.org/abs/2601.21461 Video made with manim: https://www.manim.community/ Note: I caught a cold while making this video 🤒, so the part of the voiceover is generated by my cloned voice. Sorry if the voiceover felt a bit unnatural.

How mHC Reinvents Residual Connections

How mHC Reinvents Residual Connections

The 60-Year Hunt for AI's Most Important Function

The 60-Year Hunt for AI's Most Important Function

LLMs Are Databases - So Query Them

LLMs Are Databases - So Query Them

Rotary Position Embeddings (RoPE) Explained — The Rotation Trick Behind Long-Context LLMs

Rotary Position Embeddings (RoPE) Explained — The Rotation Trick Behind Long-Context LLMs

The Most Counterintuitive Way to Build a Brain

The Most Counterintuitive Way to Build a Brain

AI Bubble vs Dot Com Crash. History is REPEATING

AI Bubble vs Dot Com Crash. History is REPEATING

Triton Kernels Actually Work - Here's Proof

Triton Kernels Actually Work - Here's Proof

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

How Attention Got So Efficient [GQA/MLA/DSA]

How Attention Got So Efficient [GQA/MLA/DSA]

Energy-Based Models Explained: The AI Beyond Next-Token

Energy-Based Models Explained: The AI Beyond Next-Token

Open Source AI Is Getting Too Big to Run

Open Source AI Is Getting Too Big to Run

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Transformers, the tech behind LLMs | Deep Learning Chapter 5

DeepSeek V4's Secret: 98% Less Memory

DeepSeek V4's Secret: 98% Less Memory

LLMs Don't Need More Parameters. They Need Loops.

LLMs Don't Need More Parameters. They Need Loops.

What I Learned From Implementing LLM Architectures From Scratch (And How to Get Started)

What I Learned From Implementing LLM Architectures From Scratch (And How to Get Started)

Training an LLM from Scratch, Locally — Angelos Perivolaropoulos, ElevenLabs

Training an LLM from Scratch, Locally — Angelos Perivolaropoulos, ElevenLabs

Only Video That Will Make You BETTER at MATH - 100%

Only Video That Will Make You BETTER at MATH - 100%

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

How AI Learned to Teach Itself [JEPA]

How AI Learned to Teach Itself [JEPA]

They solved AI’s memory problem!

They solved AI’s memory problem!