Diffusion Language Models: The Next Big Shift in GenAI

Most Large Language Models (LLMs) today are based on Autoregressive models (i.e., they predict texts in a left-to-right order). But diffusion models offer iterative refinement, flexible control, and faster sampling. In this video, we explore several ideas for applying diffusion models to language modeling. 00:00 Autoregressive LLMs 00:13 Limitations of Autoregressive models 00:56 How diffusion models work for images 01:26 DiffusionLM: Apply diffusion to word embeddings 02:46 Latent diffusion models: Apply diffusion to paragraph embeddings 03:37 Masked diffusion models 07:41 Scaling laws of diffusion models 08:53 Comparing AR and diffusion models in data-constrained settings. References: Continuous diffusion on word/paragraph embeddings: Diffusion-LM: https://arxiv.org/abs/2205.14217 Latent Diffusion for Language Generation: https://arxiv.org/abs/2212.09462 PLANNER: https://arxiv.org/abs/2306.02531 Discrete diffusion: D3PM: https://arxiv.org/abs/2107.03006 SEED: https://arxiv.org/pdf/2310.16834 Masked diffusion: https://arxiv.org/abs/2406.07524v2 https://arxiv.org/abs/2406.04329 https://arxiv.org/abs/2406.03736 Large Diffusion LLM: LLaDA: https://arxiv.org/abs/2502.09992 Dream 7B: https://hkunlp.github.io/blog/2025/dr... Scaling: https://arxiv.org/abs/2410.18514 https://arxiv.org/abs/2507.15857 Mercury (The fastest commercial-grade diffusion LLM) https://chat.inceptionlabs.ai/ Blog: A nice overview of diffusion LLMs https://spacehunterinf.github.io/blog... Video made with Manim: https://www.manim.community/