Aviral Kumar - The Importance of Exploration for Test-Time Scaling

Title: The Importance of Exploration for Test-Time Scaling Abstract: RL has enabled language models to optimize long chains of thought (CoTs), yet the field still lacks clarity on what makes these approaches succeed. Conflicting empirical results across papers often stem from differences in setting rather than principle. In this talk, I will share our perspective: effective test-time scaling hinges on in-context exploration, the ability of a model to internally experiment and infer generalizable algorithmic procedures using additional compute at inference. I will describe two RL-based approaches for training models to perform such exploration. First, I will present e3, a curriculum-based recipe that teaches models to chain together existing skills in the base model, yielding the state-of-the-art under 2B language model for math reasoning. Second, I will discuss cases where chaining alone is insufficient. There, we guide exploration by conditioning the model’s CoT on concise, self-generated natural language abstractions: short procedural summaries produced before launching into long reasoning traces. These abstractions help steer test-time search more effectively. Across tasks, conditioning RL on abstractions significantly improves in-context exploration and yields sustained performance gains even when conventional pass@k scaling plateaus.I will also talk briefly about some ongoing work that builds on these ideas to improve exploration for test-time scaling. To checkout other talks in our full NLP Seminar Series, please visit:    • UCLA NLP Seminar Series  

Parisa Kordjamshidi - Reasoning under Uncertainty with Large Multimodal Language Models
▶︎

Parisa Kordjamshidi - Reasoning under Uncertainty with Large Multimodal Language Models

Yann LeCun: World Models: Enabling the next AI revolution
▶︎

Yann LeCun: World Models: Enabling the next AI revolution

04 AI Summer School Josef Teichmann, May 4, Day 1
▶︎

04 AI Summer School Josef Teichmann, May 4, Day 1

Yann LeCun's $1B Bet Against LLMs [Part 1]
▶︎

Yann LeCun's $1B Bet Against LLMs [Part 1]

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!
▶︎

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Harness Engineering Masterclass: Technical Deep Dive on how to build Agentic Systems
▶︎

Harness Engineering Masterclass: Technical Deep Dive on how to build Agentic Systems

Zhe Gan - How to Build Your Multimodal LLMs: From Pre-training to Post-training and Agents
▶︎

Zhe Gan - How to Build Your Multimodal LLMs: From Pre-training to Post-training and Agents

Software architecture, human judgment, and AI's limits with Grady Booch
▶︎

Software architecture, human judgment, and AI's limits with Grady Booch

Idan Asher Blank - Understanding “Understanding” In Large Language Models
▶︎

Idan Asher Blank - Understanding “Understanding” In Large Language Models

GNN Explanations that do not Explain and Hot to Find Them
▶︎

GNN Explanations that do not Explain and Hot to Find Them

Training Sand to Think: Artificial General Intelligence & Future of Physics
▶︎

Training Sand to Think: Artificial General Intelligence & Future of Physics

Visualizing transformers and attention | Talk for TNG Big Tech Day '24
▶︎

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Surface Data vs. Deep Data
▶︎

Surface Data vs. Deep Data

Gil Strang's Final 18.06 Linear Algebra Lecture
▶︎

Gil Strang's Final 18.06 Linear Algebra Lecture

Vladlen Koltun: On Spatial Cognition in Frontier Models
▶︎

Vladlen Koltun: On Spatial Cognition in Frontier Models

How AI Cracked the Protein Folding Code and Won a Nobel Prize
▶︎

How AI Cracked the Protein Folding Code and Won a Nobel Prize

The problem with pretending quantum mechanics makes sense | Sean Carroll
▶︎

The problem with pretending quantum mechanics makes sense | Sean Carroll

Natasha Jaques - Social Reinforcement Learning for pluralistic alignment and human-AI interaction
▶︎

Natasha Jaques - Social Reinforcement Learning for pluralistic alignment and human-AI interaction

Ilya Sutskever – We're moving from the age of scaling to the age of research
▶︎

Ilya Sutskever – We're moving from the age of scaling to the age of research

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup
▶︎

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup