Q-learning with Flow-Matching Policies

Expressive policies such as diffusion and flow-matching policies have recently driven progress in robotic manipulation because they can model complex action distributions and generalize from just a handful of demonstrations. But most are still trained purely with supervised imitation learning. Optimizing them with off-policy reinforcement learning remains challenging, which limits real-world applicability for tasks that require online self-improvement and adaptations. In this talk, I will discuss approaches for making off-policy RL work with flow-matching policies. Speaker Bio: Qiyang (Colin) Li is a PhD student at UC Berkeley advised by Prof. Sergey Levine. His research interests include reinforcement learning and robot learning, with a focus on leveraging offline prior experience for online exploration. Before that, he was an undergraduate student at the University of Toronto advised by Prof. Roger Grosse. Find seminar details and upcoming talks: https://www.microsoft.com/en-us/resea...

UofT Robotics: Frank Park (Seoul National U) on Geometric Methods for Robot Learning

UofT Robotics: Frank Park (Seoul National U) on Geometric Methods for Robot Learning

Where the Score Lives: What Wavelets Reveal About Diffusion Models

Where the Score Lives: What Wavelets Reveal About Diffusion Models

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

The FASTEST introduction to Reinforcement Learning on the internet

The FASTEST introduction to Reinforcement Learning on the internet

Reinventing Entropy | Compression is Intelligence Part 1

Reinventing Entropy | Compression is Intelligence Part 1

If You Have A Bad Memory, I’ll Help You Fix It In 28 Minutes

If You Have A Bad Memory, I’ll Help You Fix It In 28 Minutes

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

Sarah Paine - Why Putin and Xi can't escape geography

Sarah Paine - Why Putin and Xi can't escape geography

How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat

How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat

Extending measure dynamics beyond generative modeling

Extending measure dynamics beyond generative modeling

Demis Hassabis: Agents, AGI & The Next Big Scientific Breakthrough

Demis Hassabis: Agents, AGI & The Next Big Scientific Breakthrough

Air Time: Hybrid Systems Methods for Assured AI in Advanced Air Mobility and Autonomous Aviation

Air Time: Hybrid Systems Methods for Assured AI in Advanced Air Mobility and Autonomous Aviation

Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models

Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

"A.I. and Our Economic Future," Professor Chad Jones

"A.I. and Our Economic Future," Professor Chad Jones

How Do Machines Understand Us? A History of Automatic Speech Recognition

How Do Machines Understand Us? A History of Automatic Speech Recognition

PASS Your Phlebotomy Exam! 💉 Must-Know Terms + Practice Questions

PASS Your Phlebotomy Exam! 💉 Must-Know Terms + Practice Questions

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI