How AI Taught Itself to See [DINOv3]
How can we train a general-purpose vision model to perceive our visual world? This video dives into the fascinating idea of self-supervised learning. We will discuss the basic concepts of transfer learning, contrastive language-image pretraining (CLIP), and self-supervised learning methods, including masked autoencoder, contrastive methods like SimCLR, and self-distillation methods like DINOv1, v2, and v3. I hope you enjoy the video! 00:00 Introduction 00:33 Why do features matter? 01:11 Learning features using classification 02:14 Learning features using language (CLIP) 04:09 Learning features using pretask (Self-supervised learning) 05:20 Learning features using contrast (SimCLR) 06:36 Learning features using self-distillation (DINOv1) 12:18 DINOv2 13:54 DINOv3 References: Language-image pretraining [CLIP] https://openai.com/index/clip/ Self-supervised learning (pretask): [Context encoder] https://arxiv.org/abs/1604.07379 [Colorization] https://arxiv.org/abs/1611.09842 [Rotation prediction] https://arxiv.org/abs/1803.07728 [Jigsaw puzzle] https://arxiv.org/abs/1603.09246 [Temporal order shuffling] https://arxiv.org/abs/1708.01246 Contrastive learning [SimCLR] https://arxiv.org/abs/2002.05709 Inpainting [MAE] https://arxiv.org/abs/2111.06377 [iBOT] https://arxiv.org/abs/2111.07832 Self-distillation [DINOv1] https://arxiv.org/abs/2104.14294 [DINOv2] https://arxiv.org/abs/2304.07193 [DINOv3] https://arxiv.org/abs/2508.10104 Self-supervised learning [Cookbook] https://arxiv.org/abs/2304.12210 Video made with Manim: https://www.manim.community/

Can Yann LeCun Reshape AI (again)?

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

BDH, Post-Transformer AI Explained by Jan Chorowski | Continual Learning | Session with AI Circle

But how do AI images and videos actually work? | Guest video by Welch Labs

The Strange Math That Predicts (Almost) Anything
![How Attention Got So Efficient [GQA/MLA/DSA]](https://i.ytimg.com/vi/Y-o545eYjXM/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBuOQf8Rw0rEDbSy5MucgJ2Vh6xGw)
How Attention Got So Efficient [GQA/MLA/DSA]

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

DINOv3: One backbone, multiple image/video tasks

MIT 6.S191: AI for Science

Yann LeCun's $1B Bet Against LLMs

How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

Transformers, the tech behind LLMs | Deep Learning Chapter 5

The Brain’s Learning Algorithm Isn’t Backpropagation

Flow-Matching vs Diffusion Models explained side by side

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

The Key Equation Behind Probability
![The Misconception that Almost Stopped AI [How Models Learn Part 1]](https://i.ytimg.com/vi/NrO20Jb-hy0/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLCiksXndIEYQZVVoTfArQwhou-eWw)
The Misconception that Almost Stopped AI [How Models Learn Part 1]
![The moment we stopped understanding AI [AlexNet]](https://i.ytimg.com/vi/UZDiGooFs54/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBcF15jILLvh6xWD8W-FxnR_r3Qbg)
The moment we stopped understanding AI [AlexNet]

How might LLMs store facts | Deep Learning Chapter 7

