Policy Gradient Methods: from REINFORCE to PPO

How do you do gradient ascent on a reward you can only sample — a black box you can't differentiate? A silent, animated explainer on policy-gradient methods in reinforcement learning. Covered: • The puzzle: optimizing an objective you can't differentiate • REINFORCE and the score-function estimator • The variance problem, and baselines as the cure • Actor-critic methods • Trust regions and PPO's clipped objective • Continuous control Built with Manim. No narration or music; everything is explained on screen.

Euler's Identity: e^(iπ) + 1 = 0, and the Genius Behind It

Euler's Identity: e^(iπ) + 1 = 0, and the Genius Behind It

The Memory Hierarchy: Why ML Is Memory-Bound

The Memory Hierarchy: Why ML Is Memory-Bound

Bose–Einstein Condensation: How a Billion Atoms Become One

Bose–Einstein Condensation: How a Billion Atoms Become One

The FASTEST introduction to Reinforcement Learning on the internet

The FASTEST introduction to Reinforcement Learning on the internet

World Leaders Finally TURN on Trump

World Leaders Finally TURN on Trump

Zig says NO to AI

Zig says NO to AI

But what is a Laplace Transform?

But what is a Laplace Transform?

Iran’s New "Insurance" Rule: Controlling the Strait of Hormuz

Iran’s New "Insurance" Rule: Controlling the Strait of Hormuz

Data Structures, Visually: Arrays, Hash Maps, Trees & Heaps

Data Structures, Visually: Arrays, Hash Maps, Trees & Heaps

Pushing Simulations to the LIMIT to Find Order in Chaos

Pushing Simulations to the LIMIT to Find Order in Chaos

Numbers in the Machine: Floating Point for Machine Learning

Numbers in the Machine: Floating Point for Machine Learning

Bose–Einstein Condensation: How a Billion Atoms Become One (v2)

Bose–Einstein Condensation: How a Billion Atoms Become One (v2)

How To Think SO CLEARLY People Assume You're A Genius

How To Think SO CLEARLY People Assume You're A Genius

The Fast Fourier Transform (FFT): Most Ingenious Algorithm Ever?

The Fast Fourier Transform (FFT): Most Ingenious Algorithm Ever?

System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

He Once Worked at Subway. At 58, He Solved An "Impossible" Problem

He Once Worked at Subway. At 58, He Solved An "Impossible" Problem

Learn Dynamic Programming with Animations – Full Course for Beginners

Learn Dynamic Programming with Animations – Full Course for Beginners

How GPUs Work: Why Thousands of Slow Cores Beat a Few Fast Ones

How GPUs Work: Why Thousands of Slow Cores Beat a Few Fast Ones

Abstract Black and White wave pattern| Height Map Footage| 3 hours Topographic 4k Background

Abstract Black and White wave pattern| Height Map Footage| 3 hours Topographic 4k Background

The Craziest Coding Contest Ever

The Craziest Coding Contest Ever