AlphaZero Explained: How it Learns [Convolutional Neural Network]

How does AlphaZero use neural networks to play games? How does it learn strategies from self play? I explain the details using an interactive Connect 4 AI that I built in the Python package Marimo. Link to interactive notebook for the CNN explorer: https://molab.marimo.io/notebooks/nb_... Link to Previous 2 videos in my AlphaZero explained series: Link to Part 1 where the UCB Algorithm is explained in more detail:    • AlphaZero Explained 1: How it Solves "Expl...   and Marimo UCB notebook https://molab.marimo.io/notebooks/nb_... Part 2 where we build up trees to look several moves ahead [Monte Carlo Tree Search]    • AlphaZero Explained 2: How it Looks Into t...   and Marimo MCTS notebook https://molab.marimo.io/notebooks/nb_... Chapters: 0:00 Introduction to AlphaZero and Connect 4 0:47 How AlphaZero uses neural networks 2:09 The training process: Reinforcement Learning vs. Supervised Learning 4:28 The bootstrapped training loop 5:51 Monte Carlo Tree Search (MCTS) and the expert function 8:33 The PUCT (Predictor/Polynomial Upper Confidence Tree) formula 9:45 Interactive Python/Marimo demo: MCTS vs. Neural Network 12:39 Training results and performance metrics 15:44 Visualizing the Convolutional Neural Network (CNN) 17:13 Feature planes and 3D data representation 20:13 How convolutions work: Parameters and filters 24:04 Interpreting weights and neuron activations 28:18 ReLU Non-linearity explained 30:08 Rollout upgrades for world-class performance