Proximal Policy Optimization Explained

Every "what is proximal policy optimization?", well this is the video for you. Proximal Policy Optimization (PPO) is a reinforcement learning training method. It falls into the category of policy gradient methods, which is where a predictor is trained on a gradient derived directly from a reward function. PPO is sample efficient and very stable which makes it great from RL control problems like robotics and also many other tasks. RL theory series:    • Reinforcement Learning Made Simple   ^ Watch the series above if you were confused PPO paper: https://arxiv.org/abs/1707.06347 TRPO paper: https://arxiv.org/abs/1502.05477