Exploration in Reinforcement Learning: Bandits, UCB & Thompson Sampling

When should an agent try something new instead of cashing in what it already knows? A silent, animated explainer on exploration in reinforcement learning. Covered: • The explore/exploit dilemma • Multi-armed and contextual bandits as the simplest RL • epsilon-greedy • Optimism in the face of uncertainty: UCB • Thompson sampling (posterior sampling) • Curiosity and intrinsic rewards: count-based, prediction error, RND, empowerment, information gain Built with Manim. No narration or music; everything is explained on screen.