One-step Language Modeling via Continuous Denoising | Nicholas Boffi

https://hannes-stark.com/starkly-spea... Paper: One-step Language Modeling via Continuous Denoising https://arxiv.org/abs/2602.16813 Abstract: Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. Despite their promise, these models typically produce samples whose quality sharply degrades in the few-step regime, preventing a dramatic speedup in practice. Here, we show that language models based on continuous flows over one-hot token embeddings can outperform discrete diffusion in both quality and speed. Importantly, our continuous formulation defines a unique flow map that can be learned directly for efficient few-step inference, a structure we show is unavailable to discrete methods. In this setting, we show that both the flow and its associated flow map can be learned with simple cross-entropy objectives that respect the simplex geometry of the data, and we identify three distinct choices for flow map distillation whose performance we compare in practice. Using these insights, we build a flow language model (FLM), a continuous flow that matches state-of-the-art discrete diffusion baselines on the One Billion Words (LM1B) and OpenWebText (OWT) datasets. We then distill FLM into a flow map language model (FMLM), whose one-step generation exceeds the 8-step quality of recent few-step discrete diffusion language models. Our work challenges the widely-held hypothesis that discrete noising processes are necessary for generative modeling over discrete modalities and paves the way toward accelerated language modeling at scale. Code is available at this https URL.

How to build a consistency model: Learning flow maps via self-distillation | Nicholas Boffi

How to build a consistency model: Learning flow maps via self-distillation | Nicholas Boffi

Text Diffusion — Brendan O’Donoghue, Google DeepMind

Text Diffusion — Brendan O’Donoghue, Google DeepMind

Reinventing Entropy | Compression is Intelligence Part 1

Reinventing Entropy | Compression is Intelligence Part 1

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

S18 | Language Modeling with Spherical Geometry

S18 | Language Modeling with Spherical Geometry

Meta Flow Maps enable scalable reward alignment | Peter Potaptchik

Meta Flow Maps enable scalable reward alignment | Peter Potaptchik

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-time Compute | NVIDIA

Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-time Compute | NVIDIA

Adaptive Protein Tokenization - Rohit Dilip

Adaptive Protein Tokenization - Rohit Dilip

How AI Learned to Teach Itself [JEPA]

How AI Learned to Teach Itself [JEPA]

How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat

How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat

The Man Who Went From Working At A Subway, To Solving An "Impossible" Math Problem

The Man Who Went From Working At A Subway, To Solving An "Impossible" Math Problem

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Generative Modeling via Drifting | MingYang Deng

Generative Modeling via Drifting | MingYang Deng

Hasan Piker & Yanis Varoufakis | Banned for Insufficient Support of Genocide

Hasan Piker & Yanis Varoufakis | Banned for Insufficient Support of Genocide

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Stanford CS25: Transformers United V6 I From Language Models to Native Multimodal Intelligence

Stanford CS25: Transformers United V6 I From Language Models to Native Multimodal Intelligence

Latent Causal Diffusions for Single-Cell Perturbation Modeling | Lars Lorch

Latent Causal Diffusions for Single-Cell Perturbation Modeling | Lars Lorch

BiomniBench: Evaluating AI Agents in Biology | Yunhao Qu

BiomniBench: Evaluating AI Agents in Biology | Yunhao Qu

Naomi presents: FALCUN: A Simple and Efficient Deep Active Learning Strategy

Naomi presents: FALCUN: A Simple and Efficient Deep Active Learning Strategy