Diffusion Language Models: The Next Big Shift in GenAI

Most Large Language Models (LLMs) today are based on Autoregressive models (i.e., they predict texts in a left-to-right order). But diffusion models offer iterative refinement, flexible control, and faster sampling. In this video, we explore several ideas for applying diffusion models to language modeling. 00:00 Autoregressive LLMs 00:13 Limitations of Autoregressive models 00:56 How diffusion models work for images 01:26 DiffusionLM: Apply diffusion to word embeddings 02:46 Latent diffusion models: Apply diffusion to paragraph embeddings 03:37 Masked diffusion models 07:41 Scaling laws of diffusion models 08:53 Comparing AR and diffusion models in data-constrained settings. References: Continuous diffusion on word/paragraph embeddings: Diffusion-LM: https://arxiv.org/abs/2205.14217 Latent Diffusion for Language Generation: https://arxiv.org/abs/2212.09462 PLANNER: https://arxiv.org/abs/2306.02531 Discrete diffusion: D3PM: https://arxiv.org/abs/2107.03006 SEED: https://arxiv.org/pdf/2310.16834 Masked diffusion: https://arxiv.org/abs/2406.07524v2 https://arxiv.org/abs/2406.04329 https://arxiv.org/abs/2406.03736 Large Diffusion LLM: LLaDA: https://arxiv.org/abs/2502.09992 Dream 7B: https://hkunlp.github.io/blog/2025/dr... Scaling: https://arxiv.org/abs/2410.18514 https://arxiv.org/abs/2507.15857 Mercury (The fastest commercial-grade diffusion LLM) https://chat.inceptionlabs.ai/ Blog: A nice overview of diffusion LLMs https://spacehunterinf.github.io/blog... Video made with Manim: https://www.manim.community/

LLMs Don't Need More Parameters. They Need Loops.

LLMs Don't Need More Parameters. They Need Loops.

How AI Learned to Teach Itself [JEPA]

How AI Learned to Teach Itself [JEPA]

But how do AI images and videos actually work? | Guest video by Welch Labs

But how do AI images and videos actually work? | Guest video by Welch Labs

Advancing Diffusion Models for Text Generation

Advancing Diffusion Models for Text Generation

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Language Diffusion Models From Scratch: Maybe Diffusion is All We Need?

Language Diffusion Models From Scratch: Maybe Diffusion is All We Need?

Steering LLM Behavior Without Fine-Tuning

Steering LLM Behavior Without Fine-Tuning

Flow-Matching vs Diffusion Models explained side by side

Flow-Matching vs Diffusion Models explained side by side

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Stanford CS25: Transformers United V6 I From Language Models to Native Multimodal Intelligence

Stanford CS25: Transformers United V6 I From Language Models to Native Multimodal Intelligence

Diffusion Models for AI Image Generation

Diffusion Models for AI Image Generation

How I Understand Flow Matching

How I Understand Flow Matching

Learn Text Embeddings in 20 Minutes (full guide for beginners)

Learn Text Embeddings in 20 Minutes (full guide for beginners)

Intuition behind Mamba and State Space Models | Enhancing LLMs!

Intuition behind Mamba and State Space Models | Enhancing LLMs!

Large Language Models explained briefly

Large Language Models explained briefly

What are Diffusion Models?

What are Diffusion Models?

Why Inference is hard..

Why Inference is hard..

How I Understand Diffusion Models

How I Understand Diffusion Models

The 60-Year Hunt for AI's Most Important Function

The 60-Year Hunt for AI's Most Important Function

Discrete generative modeling with masked diffusions (Jiaxin Shi, Google DeepMind)

Discrete generative modeling with masked diffusions (Jiaxin Shi, Google DeepMind)