How AI Taught Itself to See [DINOv3]

How can we train a general-purpose vision model to perceive our visual world? This video dives into the fascinating idea of self-supervised learning. We will discuss the basic concepts of transfer learning, contrastive language-image pretraining (CLIP), and self-supervised learning methods, including masked autoencoder, contrastive methods like SimCLR, and self-distillation methods like DINOv1, v2, and v3. I hope you enjoy the video! 00:00 Introduction 00:33 Why do features matter? 01:11 Learning features using classification 02:14 Learning features using language (CLIP) 04:09 Learning features using pretask (Self-supervised learning) 05:20 Learning features using contrast (SimCLR) 06:36 Learning features using self-distillation (DINOv1) 12:18 DINOv2 13:54 DINOv3 References: Language-image pretraining [CLIP] https://openai.com/index/clip/ Self-supervised learning (pretask): [Context encoder] https://arxiv.org/abs/1604.07379 [Colorization] https://arxiv.org/abs/1611.09842 [Rotation prediction] https://arxiv.org/abs/1803.07728 [Jigsaw puzzle] https://arxiv.org/abs/1603.09246 [Temporal order shuffling] https://arxiv.org/abs/1708.01246 Contrastive learning [SimCLR] https://arxiv.org/abs/2002.05709 Inpainting [MAE] https://arxiv.org/abs/2111.06377 [iBOT] https://arxiv.org/abs/2111.07832 Self-distillation [DINOv1] https://arxiv.org/abs/2104.14294 [DINOv2] https://arxiv.org/abs/2304.07193 [DINOv3] https://arxiv.org/abs/2508.10104 Self-supervised learning [Cookbook] https://arxiv.org/abs/2304.12210 Video made with Manim: https://www.manim.community/

This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

How Small Models Learn to Think Like Giants

How Small Models Learn to Think Like Giants

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

How AI Learned to Teach Itself [JEPA]

How AI Learned to Teach Itself [JEPA]

Text Diffusion — Brendan O’Donoghue, Google DeepMind

Text Diffusion — Brendan O’Donoghue, Google DeepMind

Transformers: Attention Is Just Weighted Dot Products | The Math Behind AI

Transformers: Attention Is Just Weighted Dot Products | The Math Behind AI

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

LLMs Don't Need More Parameters. They Need Loops.

LLMs Don't Need More Parameters. They Need Loops.

The 60-Year Hunt for AI's Most Important Function

The 60-Year Hunt for AI's Most Important Function

Flow-Matching vs Diffusion Models explained side by side

Flow-Matching vs Diffusion Models explained side by side

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

The Tiny Idea That Lets Anyone Fine-Tune AI

The Tiny Idea That Lets Anyone Fine-Tune AI

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

How DINO learns to see the world - Paper Explained

How DINO learns to see the world - Paper Explained

DINO: Self-Supervised Vision Transformers

DINO: Self-Supervised Vision Transformers

The Most Underrated Layer Inside Every AI Model

The Most Underrated Layer Inside Every AI Model

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

DINOv3 Paper Explained: The Computer Vision Foundation Model

DINOv3 Paper Explained: The Computer Vision Foundation Model

DINOv3: One backbone, multiple image/video tasks

DINOv3: One backbone, multiple image/video tasks

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]