Discrete generative modeling with masked diffusions (Jiaxin Shi, Google DeepMind)

Date: Oct 11, 2024 Abstract: Modern generative AI has developed along two distinct paths: autoregressive models for discrete data (such as text) and diffusion models for continuous data (like images). Bridging this divide by adapting diffusion models to handle discrete data represents a compelling avenue for unifying these disparate approaches. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these issues. In this work, I will introduce masked diffusion models, a simple and general framework that unlock the full potential of diffusion models for discrete data. We show that the continuous-time variational objective of such models is a simple weighted integral of cross-entropy losses. Our framework also enables training generalized masked diffusion models with state-dependent masking schedules. When evaluated by perplexity, our models trained on OpenWebText surpass prior diffusion language models at GPT-2 scale and demonstrate superior performance on 4 out of 5 zero-shot language modeling tasks. Furthermore, our models vastly outperform previous discrete diffusion models on pixel-level image modeling, achieving 2.75 (CIFAR-10) and 3.40 (ImageNet 64×64) bits per dimension that are better than autoregressive models of similar sizes. Bio: Jiaxin Shi is a research scientist at Google DeepMind. Previously, he was a postdoctoral researcher at Stanford and Microsoft Research New England. He obtained his Ph.D. from Tsinghua University. His research interests broadly involve probabilistic and algorithmic models for learning as well as the interface between them. Jiaxin served as an area chair for NeurIPS and AISTATS. He is a recipient of Microsoft Research PhD fellowship. His first-author paper was recognized by a NeurIPS 2022 outstanding paper award.

Yann LeCun: World Models: Enabling the next AI revolution
▶︎

Yann LeCun: World Models: Enabling the next AI revolution

Text Diffusion — Brendan O’Donoghue, Google DeepMind
▶︎

Text Diffusion — Brendan O’Donoghue, Google DeepMind

Your Brain on Energy-Based Models (Will Grathwohl, Deepmind)
▶︎

Your Brain on Energy-Based Models (Will Grathwohl, Deepmind)

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source
▶︎

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Learning to Generate Data by Estimating Gradients of the Data Distribution (Yang Song, Stanford)
▶︎

Learning to Generate Data by Estimating Gradients of the Data Distribution (Yang Song, Stanford)

Nvidia CEO Jensen Huang Interview| Bloomberg Technology Special
▶︎

Nvidia CEO Jensen Huang Interview| Bloomberg Technology Special

AlphaFold - The Most Useful Thing AI Has Ever Done
▶︎

AlphaFold - The Most Useful Thing AI Has Ever Done

MatterGen: a generative model for inorganic materials design (Tian Xie, Microsoft Research)
▶︎

MatterGen: a generative model for inorganic materials design (Tian Xie, Microsoft Research)

Yann LeCun's $1B Bet Against LLMs [Part 1]
▶︎

Yann LeCun's $1B Bet Against LLMs [Part 1]

Web Scraping Using Python For Beginners and File Handling in Python | Python Web Scraping
▶︎

Web Scraping Using Python For Beginners and File Handling in Python | Python Web Scraping

But how do AI images and videos actually work? | Guest video by Welch Labs
▶︎

But how do AI images and videos actually work? | Guest video by Welch Labs

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat
▶︎

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

How ASML Makes Chips Faster With Its New $400 Million High NA Machine
▶︎

How ASML Makes Chips Faster With Its New $400 Million High NA Machine

How AI Cracked the Protein Folding Code and Won a Nobel Prize
▶︎

How AI Cracked the Protein Folding Code and Won a Nobel Prize

Flow Matching for Generative Modeling (Paper Explained)
▶︎

Flow Matching for Generative Modeling (Paper Explained)

RAG Crash Course for Beginners
▶︎

RAG Crash Course for Beginners

Diffusion Models for Solving Inverse Problems (Jiaming Song, NVIDIA)
▶︎

Diffusion Models for Solving Inverse Problems (Jiaming Song, NVIDIA)

Structured State Space Models for Deep Sequence Modeling (Albert Gu, CMU)
▶︎

Structured State Space Models for Deep Sequence Modeling (Albert Gu, CMU)

Training Sand to Think: Artificial General Intelligence & Future of Physics
▶︎

Training Sand to Think: Artificial General Intelligence & Future of Physics

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker
▶︎

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker