How does DeepSeek actually work? | Full technical review

In this video, we dive into the technical innovations behind DeepSeek-R1: scaling with compute (Reasoning-Oriented Reinforcement Learning, Chain-of-Thought, GRPO, Distillation). While knowledge about AI is helpful, general software engineers should still get great value out of it. More LLM tech deep dives:    • Discover How LLMs Work by Dissecting Llama   Blogpost version of this video: https://juliaturc.substack.com/p/deep... 00:00 Intro 00:57 Scaling using compute instead of data 02:18 Overview of LLM training 03:20 Training DeepSeek 04:09 Reasoning-Oriented Reinforcement Learning 06:45 DeepSeek-R1-Zero 08:00 Back to training DeepSeek 10:29 Chain-of-Thought 12:19 GRPO 13:46 Distillation 14:45 Outro Correction: The cited $6M cost was incurred by DeepSeek-V3, not DeepSeek-R1. The cost for the latter is unknown (Source: https://www.reuters.com/technology/ar...)