Which Experiences Are Influential For RL Agents

Which experiences in a replay buffer actually help an RL agent, and which ones hurt it? In this video, we break down the 2025 Reinforcement Learning Journal paper: “Which Experiences Are Influential for RL Agents? Efficiently Estimating the Influence of Experiences” by Hiraoka, Wang, Onishi, and Tsuruoka. The paper introduces PIToD, Policy Iteration with Turn-over Dropout, a method for estimating the influence of experience data in reinforcement learning without expensive Leave-One-Out retraining. We cover why influence estimation matters, why the classic LOO approach is computationally infeasible, how PIToD uses masks and flipped masks to isolate experience influence, and how the method can even improve underperforming agents by disabling harmful experience groups. Topics covered: • Experience replay in off-policy reinforcement learning • Policy evaluation and policy improvement • Why Leave-One-Out influence estimation is too slow • PIToD: Policy Iteration with Turn-over Dropout • Binary masks and flipped masks • Influence estimation without retraining • Self-influence and primacy bias • Fixing underperforming RL agents through amendment • Open questions for scaling PIToD to larger networks and multi-agent settings Chapters: 00:00 Introduction 01:30 Experience Replay Foundations 04:00 Why Leave-One-Out Is Too Slow 06:30 PIToD: Policy Iteration with Turn-over Dropout 11:30 Theoretical Foundation 13:30 Evaluation Results 17:30 Fixing Underperforming Agents 19:30 Outro Paper: https://rlj.cs.umass.edu/2025/papers/... If you found this useful, like the video, leave a comment, and subscribe for more deep dives into reinforcement learning, AI research, and machine learning papers.