Which Experiences Are Influential For RL Agents
Which experiences in a replay buffer actually help an RL agent, and which ones hurt it? In this video, we break down the 2025 Reinforcement Learning Journal paper: “Which Experiences Are Influential for RL Agents? Efficiently Estimating the Influence of Experiences” by Hiraoka, Wang, Onishi, and Tsuruoka. The paper introduces PIToD, Policy Iteration with Turn-over Dropout, a method for estimating the influence of experience data in reinforcement learning without expensive Leave-One-Out retraining. We cover why influence estimation matters, why the classic LOO approach is computationally infeasible, how PIToD uses masks and flipped masks to isolate experience influence, and how the method can even improve underperforming agents by disabling harmful experience groups. Topics covered: • Experience replay in off-policy reinforcement learning • Policy evaluation and policy improvement • Why Leave-One-Out influence estimation is too slow • PIToD: Policy Iteration with Turn-over Dropout • Binary masks and flipped masks • Influence estimation without retraining • Self-influence and primacy bias • Fixing underperforming RL agents through amendment • Open questions for scaling PIToD to larger networks and multi-agent settings Chapters: 00:00 Introduction 01:30 Experience Replay Foundations 04:00 Why Leave-One-Out Is Too Slow 06:30 PIToD: Policy Iteration with Turn-over Dropout 11:30 Theoretical Foundation 13:30 Evaluation Results 17:30 Fixing Underperforming Agents 19:30 Outro Paper: https://rlj.cs.umass.edu/2025/papers/... If you found this useful, like the video, leave a comment, and subscribe for more deep dives into reinforcement learning, AI research, and machine learning papers.

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

The French Do Not Care About Work

FULL DISCUSSION: Google's Demis Hassabis, Anthropic's Dario Amodei Debate the World After AGI | AI1G

Why AI Can Never Escape Turing's 1936 Proof

Why Aliens Would NEVER Invade Africa

Training Sand to Think: Artificial General Intelligence & Future of Physics

Ex-Google Recruiter Explains Why "Lying" Gets You Hired

Actor Critic Methods In Reinforcement Learning

How SpaceX Humiliated Wall Street

Anatomy of AI Agents: Inside LLMs, RAG Systems, & Generative AI

The Strange Math That Predicts (Almost) Anything

Model Based Reinforcement Learning

Medical White Molecular Background video | Footage | Screensaver

Something is jamming GPS over Europe. Here's what we found

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Don't learn AI Agents without Learning these Fundamentals

How AI agents & Claude skills work (Clearly Explained)

How AI Cracked the Protein Folding Code and Won a Nobel Prize

