Language Generation in the Limit

Jon Kleinberg (Cornell University) https://simons.berkeley.edu/talks/jon... Transformers as a Computational Model Although current large language models are complex, the most basic specifications of the underlying language generation problem itself are simple to state: given a finite set of training samples from an unknown language, produce valid new strings from the language that don't already appear in the training data. Here we ask what we can conclude about language generation using only this specification, without any further properties or distributional assumptions. In particular, we consider models in which an adversary enumerates the strings of an unknown target language that is known only to come from a possibly infinite list of candidates, and we show that it is possible to give certain non-trivial guarantees for language generation in this setting. The resulting guarantees contrast dramatically with negative results due to Gold and Angluin in a well-studied model of language learning where the goal is to identify an unknown language from samples; the difference between these results suggests that identifying a language is a fundamentally different problem than generating from it. (This is joint work with Sendhil Mullainathan.)

Do Large Language Models Perform Latent Reasoning? (Remote Talk)

Do Large Language Models Perform Latent Reasoning? (Remote Talk)

Surface Data vs. Deep Data

Surface Data vs. Deep Data

Language Generation in the Limit - Jon Kleinberg

Language Generation in the Limit - Jon Kleinberg

The Microsoft Engineer Who Built a Tool That's Quietly Changing Math Forever | Kevin Hartnett

The Microsoft Engineer Who Built a Tool That's Quietly Changing Math Forever | Kevin Hartnett

1986: How to Spot the Upper Class | That's Life! | BBC Archive

1986: How to Spot the Upper Class | That's Life! | BBC Archive

The Story of Python and how it took over the world | Python: The Documentary

The Story of Python and how it took over the world | Python: The Documentary

Overexamined Algorithms and Overlooked Agency (Homa Hosseinmardi)

Overexamined Algorithms and Overlooked Agency (Homa Hosseinmardi)

Richard P. Feynman: Probability and Uncertainty; The Quantum Mechanical View of Nature

Richard P. Feynman: Probability and Uncertainty; The Quantum Mechanical View of Nature

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

6. Monte Carlo Simulation

6. Monte Carlo Simulation

Co-Creator of Haskell: Functional Programming, Thinking in Types, Useless Languages | Simon Jones

Co-Creator of Haskell: Functional Programming, Thinking in Types, Useless Languages | Simon Jones

'Listen Like You Might Be Wrong': Harvard Student Goes Viral For Stunning Speech On Trump Amid Feud

'Listen Like You Might Be Wrong': Harvard Student Goes Viral For Stunning Speech On Trump Amid Feud

Google DeepMind Distinguished Eng (L9): How To Land a Job at a Frontier Lab | Vlad Feinberg

Google DeepMind Distinguished Eng (L9): How To Land a Job at a Frontier Lab | Vlad Feinberg

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

What to teach when AI writes the code | Rainer Stropek | TEDxLinz

What to teach when AI writes the code | Rainer Stropek | TEDxLinz

What are Grammars (in Theory of Computation)?

What are Grammars (in Theory of Computation)?

From "Umwelt" to "World" models

From "Umwelt" to "World" models

Jon Kleinberg | The challenge of understanding what users want

Jon Kleinberg | The challenge of understanding what users want

STOC24 2 C 2 Calibrated Language Models Must Hallucinate

STOC24 2 C 2 Calibrated Language Models Must Hallucinate

On the Limits of Language Generation | STOC 2025

On the Limits of Language Generation | STOC 2025