
▶︎
CS885 Lecture 7b: Actor Critic

▶︎
CS885 Lecture 9: Model-based RL

▶︎
CS885 Lecture 11b: Partially Observable RL

▶︎
Policy Gradient Theorem Explained - Reinforcement Learning

▶︎
CS885 Lecture 14c: Trust Region Methods

▶︎
RL Course by David Silver - Lecture 7: Policy Gradient Methods

▶︎
An introduction to Policy Gradient methods - Deep Reinforcement Learning

▶︎
CS885 Lecture 10: Bayesian RL

▶︎
Reinventing Entropy | Compression is Intelligence Part 1

▶︎
CS885 Lecture 8a: Multi-armed bandits

▶︎
A visual guide to Bayesian thinking

▶︎
L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

▶︎
Deep RL Bootcamp Lecture 4A: Policy Gradients

▶︎
Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial
![DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]](https://i.ytimg.com/vi/y3oqOjHilio/hq720.jpg?sqp=-oaymwEbCNAFEJQDSFryq4qpAw0IARUAAIhCGAG4AvcY&rs=AOn4CLAWUmT2qNFskD2aWFKqCGMhqTcf3g&usqp=CCc)
▶︎
DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

▶︎
CS885 Lecture17c: Inverse Reinforcement Learning

▶︎
REINFORCE: Reinforcement Learning Most Fundamental Algorithm

▶︎
Richard Feynman. Why.

▶︎
