Function Approximation | Reinforcement Learning Part 5
The machine learning consultancy: https://truetheta.io Join my email list to get educational and useful articles (and nothing else!): https://mailchi.mp/truetheta/true-the... Want to work together? See here: https://truetheta.io/about/#want-to-w... Here, we learn about Function Approximation. This is a broad class of methods for learning within state spaces that are far too large for our previous methods to work. This is part five of a six part series on Reinforcement Learning. SOCIAL MEDIA LinkedIn : / dj-rich-90b91753 Twitter : / duanejrich Github: https://github.com/Duane321 Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation SOURCES [1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018. [2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, • DeepMind x UCL | Deep Learning Lecture Ser... SOURCE NOTES This video covers topics from chapters 9, 10 and 11 from [1], with only a light covering of chapter 11. [2] includes a lecture on Function Approximation, which was a helpful secondary source. TIMESTAMP 0:00 Intro 0:25 Large State Spaces and Generalization 1:55 On Policy Evaluation 4:31 How do we select w? 6:46 How do we choose our target U? 9:27 A Linear Value Function 10:34 1000-State Random Walk 12:51 On Policy Control with FA 14:26 The Mountain Car Task 19:30 Off-Policy Methods with FA LINKS 1000-State Random Walk Problem: https://github.com/Duane321/mutual_in... Mountain Car Task: https://github.com/Duane321/mutual_in... NOTES [1] In the Mountain Car Task, I left out a hyperparameter to tune: Lambda. This controls how far away the evenly spaced proto-points are from any given evaluation point. If lambda is very high, the prototypical points are considered very close together, and they won't do a good job discriminating different values over the state space. But if lambda is too low, then the prototypical points won't share any information beyond a tiny region surrounding each point.

Policy Gradient Methods | Reinforcement Learning Part 6
![Yann LeCun's $1B Bet Against LLMs [Part 1]](https://i.ytimg.com/vi/kYkIdXwW2AE/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDbV4izF3i-wxevCVIn7FJjoy1vlA)
Yann LeCun's $1B Bet Against LLMs [Part 1]

The Strange Math That Predicts (Almost) Anything

If You Have A Bad Memory, I’ll Help You Fix It In 28 Minutes

RL Course by David Silver - Lecture 6: Value Function Approximation

Russell's Paradox - a simple explanation of a profound problem

The surprisingly hard math problem - Collatz conjecture explained | Terence Tao and Lex Fridman

Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

I Hacked This Temu Router. What I Found Should Be Illegal.

Reinforcement Learning, by the Book

Terence Tao: Nobody Understands Why AI Actually Works

The Key Equation Behind Probability

Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

Harvard Professor Explains The Rules of Writing — Steven Pinker

Why do prime numbers make these spirals? | Dirichlet’s theorem and pi approximations

MIT 6.S191 (2024): Reinforcement Learning

Gaussian Processes

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

