Long Horizon Reasoning | Sumeet Motwani, PhD Student at Oxford

Sumeet Motwani is a PhD student at Oxford, where his research is funded by Eric Schmidt and advised by Philip Torr. His work focuses on RL post-training, multi-agent systems, and AI security. Sumeet has spent time at Microsoft Research and Google, and his previous work includes MALT, Secret Collusion, Agent Q, and h1. In this Frontier Research Club talk, Sumeet presents Bootstrapping Long-Horizon LLM Reasoning with RL, exploring how reinforcement learning can help language models reason across longer chains of dependent tasks. The talk looks at a key limitation in current reasoning systems: models may perform well on short problems, but still struggle when tasks require many connected steps, long outputs, backtracking, and consistency over time. Sumeet introduces h1, a method for building longer reasoning tasks from smaller components and training models with curriculum RL over increasing horizon lengths. The goal is to give models useful learning signal as tasks become longer, instead of relying only on final-answer rewards. The talk also discusses how to benchmark long-horizon reasoning more seriously, including compositional tasks, DAG-structured dependencies, verification, and results across frontier models. Topics include: • Long-horizon LLM reasoning • RL post-training • h1 • Curriculum RL • Compositional reasoning • DAG-based task composition • Horizon length • Long chain-of-thought • Outcome-based rewards • Verification and backtracking • Long-horizon reasoning benchmarks • Benchmarking frontier models • RL for reasoning models Presented at Frontier Research Club by Sumeet Motwani. Recorded on February 4, 2026, at AngelList. Frontier Research Club is a curated forum for rigorous discussion on how AI is reshaping the scientific research process. We convene researchers, computational scientists, and research engineers to examine concrete work across literature synthesis, hypothesis generation, experimental design, simulation, analysis, safety, and reproducibility. Upcoming events: https://luma.com/frontiersyndicate Subscribe for more research talks, technical discussions, and frontier AI presentations.

Small Batch Size Training for Language Models | Sanae Lotfi, Research Scientist at Meta FAIR

Small Batch Size Training for Language Models | Sanae Lotfi, Research Scientist at Meta FAIR

Long Context Agents: Divide-and-Conquer Approaches for Chunking, Retrieval, and Reasoning | Zach Xu

Long Context Agents: Divide-and-Conquer Approaches for Chunking, Retrieval, and Reasoning | Zach Xu

Moondream Segmentation: From Words to Masks | Ethan Reid, Research Scientist at Moondream

Moondream Segmentation: From Words to Masks | Ethan Reid, Research Scientist at Moondream

Art of Scaling Reinforcement Learning Compute for LLMs | Bonnie Li, AI Researcher at Google DeepMind

Art of Scaling Reinforcement Learning Compute for LLMs | Bonnie Li, AI Researcher at Google DeepMind

Exploring Exploration with Foundation Agents | Danny Sawyer, Research Scientist at Google DeepMind

Exploring Exploration with Foundation Agents | Danny Sawyer, Research Scientist at Google DeepMind

TERMS-Bench: Evaluating LLMs in Semi-Verifiable Domains | Erica Zhang, Stanford PhD

TERMS-Bench: Evaluating LLMs in Semi-Verifiable Domains | Erica Zhang, Stanford PhD

How To Think SO CLEARLY People Assume You're A Genius

How To Think SO CLEARLY People Assume You're A Genius

PhD Admissions in USA: What No One Tells Applicants [2026]

PhD Admissions in USA: What No One Tells Applicants [2026]

Count Binface destroys Sky News interviewer

Count Binface destroys Sky News interviewer

When an audition changed TV forever

When an audition changed TV forever

Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification

Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification

"Got any hobbies?"

"Got any hobbies?"

الرقية الشرعية للشفاءمن السحروالعين والحسد حصن من الشيطان رقية البيت والاولاد بصوت القارئ سعيد حمدان

الرقية الشرعية للشفاءمن السحروالعين والحسد حصن من الشيطان رقية البيت والاولاد بصوت القارئ سعيد حمدان

How To Become Dangerously Self-Educated (with AI)

How To Become Dangerously Self-Educated (with AI)

Natasha Jaques PhD Thesis Defense

Natasha Jaques PhD Thesis Defense

Harvard Professor Explains The Rules of Writing — Steven Pinker

Harvard Professor Explains The Rules of Writing — Steven Pinker

Anthropic is Completely F*cked.

Anthropic is Completely F*cked.

The Man Asked If I Was Still Looking for My Son—Then He Said, “I’m the Kid in..." - Calm Dad Stories

The Man Asked If I Was Still Looking for My Son—Then He Said, “I’m the Kid in..." - Calm Dad Stories

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)

Train Your Brain to Never Forget (5 Feynman Habits)

Train Your Brain to Never Forget (5 Feynman Habits)