Q-learning with Flow-Matching Policies

Expressive policies such as diffusion and flow-matching policies have recently driven progress in robotic manipulation because they can model complex action distributions and generalize from just a handful of demonstrations. But most are still trained purely with supervised imitation learning. Optimizing them with off-policy reinforcement learning remains challenging, which limits real-world applicability for tasks that require online self-improvement and adaptations. In this talk, I will discuss approaches for making off-policy RL work with flow-matching policies. Speaker Bio: Qiyang (Colin) Li is a PhD student at UC Berkeley advised by Prof. Sergey Levine. His research interests include reinforcement learning and robot learning, with a focus on leveraging offline prior experience for online exploration. Before that, he was an undergraduate student at the University of Toronto advised by Prof. Roger Grosse. Find seminar details and upcoming talks: https://www.microsoft.com/en-us/resea...

UofT Robotics: Frank Park (Seoul National U) on Geometric Methods for Robot Learning
▶︎

UofT Robotics: Frank Park (Seoul National U) on Geometric Methods for Robot Learning

Where the Score Lives: What Wavelets Reveal About Diffusion Models
▶︎

Where the Score Lives: What Wavelets Reveal About Diffusion Models

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup
▶︎

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Yann LeCun: World Models: Enabling the next AI revolution
▶︎

Yann LeCun: World Models: Enabling the next AI revolution

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit
▶︎

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

The FASTEST introduction to Reinforcement Learning on the internet
▶︎

The FASTEST introduction to Reinforcement Learning on the internet

Reinventing Entropy | Compression is Intelligence Part 1
▶︎

Reinventing Entropy | Compression is Intelligence Part 1

If You Have A Bad Memory, I’ll Help You Fix It In 28 Minutes
▶︎

If You Have A Bad Memory, I’ll Help You Fix It In 28 Minutes

AlphaFold - The Most Useful Thing AI Has Ever Done
▶︎

AlphaFold - The Most Useful Thing AI Has Ever Done

Sarah Paine - Why Putin and Xi can't escape geography
▶︎

Sarah Paine - Why Putin and Xi can't escape geography

How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat
▶︎

How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat

Extending measure dynamics beyond generative modeling
▶︎

Extending measure dynamics beyond generative modeling

Demis Hassabis: Agents, AGI & The Next Big Scientific Breakthrough
▶︎

Demis Hassabis: Agents, AGI & The Next Big Scientific Breakthrough

Air Time: Hybrid Systems Methods for Assured AI in Advanced Air Mobility and Autonomous Aviation
▶︎

Air Time: Hybrid Systems Methods for Assured AI in Advanced Air Mobility and Autonomous Aviation

Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models
▶︎

Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat
▶︎

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

"A.I. and Our Economic Future," Professor Chad Jones
▶︎

"A.I. and Our Economic Future," Professor Chad Jones

How Do Machines Understand Us? A History of Automatic Speech Recognition
▶︎

How Do Machines Understand Us? A History of Automatic Speech Recognition

PASS Your Phlebotomy Exam! 💉 Must-Know Terms + Practice Questions
▶︎

PASS Your Phlebotomy Exam! 💉 Must-Know Terms + Practice Questions

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI
▶︎

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI