Function Approximation | Reinforcement Learning Part 5

The machine learning consultancy: https://truetheta.io Join my email list to get educational and useful articles (and nothing else!): https://mailchi.mp/truetheta/true-the... Want to work together? See here: https://truetheta.io/about/#want-to-w... Here, we learn about Function Approximation. This is a broad class of methods for learning within state spaces that are far too large for our previous methods to work. This is part five of a six part series on Reinforcement Learning. SOCIAL MEDIA LinkedIn : / dj-rich-90b91753 Twitter : / duanejrich Github: https://github.com/Duane321 Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation SOURCES [1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018. [2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, • DeepMind x UCL | Deep Learning Lecture Ser... SOURCE NOTES This video covers topics from chapters 9, 10 and 11 from [1], with only a light covering of chapter 11. [2] includes a lecture on Function Approximation, which was a helpful secondary source. TIMESTAMP 0:00 Intro 0:25 Large State Spaces and Generalization 1:55 On Policy Evaluation 4:31 How do we select w? 6:46 How do we choose our target U? 9:27 A Linear Value Function 10:34 1000-State Random Walk 12:51 On Policy Control with FA 14:26 The Mountain Car Task 19:30 Off-Policy Methods with FA LINKS 1000-State Random Walk Problem: https://github.com/Duane321/mutual_in... Mountain Car Task: https://github.com/Duane321/mutual_in... NOTES [1] In the Mountain Car Task, I left out a hyperparameter to tune: Lambda. This controls how far away the evenly spaced proto-points are from any given evaluation point. If lambda is very high, the prototypical points are considered very close together, and they won't do a good job discriminating different values over the state space. But if lambda is too low, then the prototypical points won't share any information beyond a tiny region surrounding each point.

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

RL Course by David Silver - Lecture 6: Value Function Approximation

RL Course by David Silver - Lecture 6: Value Function Approximation

Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

The Mechanics of Diffusion: DDPM and DDIM Explained

The Mechanics of Diffusion: DDPM and DDIM Explained

Nobody Explained the Schrödinger Equation Like THIS!

Nobody Explained the Schrödinger Equation Like THIS!

Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

Policy Gradient Theorem Explained - Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning

Why does every mammal get 1 billion heartbeats in their life?

Why does every mammal get 1 billion heartbeats in their life?

The Boundary of Computation

The Boundary of Computation

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Anthropic CEO WARNS: "It's Moving Faster Than Anyone Realizes

Anthropic CEO WARNS: "It's Moving Faster Than Anyone Realizes

Prof. Dr. Christian Bauckhage (Fraunhofer IAIS): AI - We haven't seen anything yet!

Prof. Dr. Christian Bauckhage (Fraunhofer IAIS): AI - We haven't seen anything yet!

The Fisher Information

The Fisher Information

Reinforcement Learning, by the Book

Reinforcement Learning, by the Book

What happens at the Boundary of Computation?

What happens at the Boundary of Computation?

Policy Gradient in 30 min

Policy Gradient in 30 min

Reinforcement Learning from scratch

Reinforcement Learning from scratch