Stochastic Approximation and Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

Stochastic approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions. Among many algorithms in machine learning, reinforcement learning algorithms such as TD- and Q-learning are two of its most famous applications. This talk will provide an overview of stochastic approximation, with focus on optimizing the rate of convergence. Based on this general theory, the well known slow convergence of Q-learning is explained: the variance of the algorithm is typically infinite. Three new Q-learning algorithms are introduced to dramatically improve performance: (i) The Zap Q-learning algorithm that has provably optimal asymptotic variance, and resembles the Newton-Raphson method in a deterministic setting (ii) The PolSA algorithm that is based on Polyak'smomentum technique, but with a specialized matrix momentum, and (iii) The NeSA algorithm based on Nesterov's acceleration technique. Analysis of (ii) and (iii) require entirely new analytic techniques. One approach is via coupling: conditions are established under which the parameter estimates obtained using the PolSA algorithm couple with those obtained using the Newton-Raphson based algorithm. Numerical examples confirm this behavior, and the remarkable performance of these algorithms. See more at https://www.microsoft.com/en-us/resea...

Causal Effects and Overlap in High-dimensional or Sequential Data

Causal Effects and Overlap in High-dimensional or Sequential Data

25. Stochastic Gradient Descent

25. Stochastic Gradient Descent

Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

Anne Auger - Slow Convergence of Stochastic Optimization Algorithms Without Derivatives Is Avoidable

Anne Auger - Slow Convergence of Stochastic Optimization Algorithms Without Derivatives Is Avoidable

A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...

A Tutorial on Finite-Sample Guarantees of Contractive Stochastic Approximation With...

The Closest We’ve Come to a Theory of Everything

The Closest We’ve Come to a Theory of Everything

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

The Strange Math That Predicts (Almost) Anything

The Strange Math That Predicts (Almost) Anything

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Series: Overview of Methods

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

Python Variables | Python Operators | Python Tutorial For Beginners | Intellipaat

An introduction to Policy Gradient methods - Deep Reinforcement Learning

An introduction to Policy Gradient methods - Deep Reinforcement Learning

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

But what is quantum computing? (Grover's Algorithm)

But what is quantum computing? (Grover's Algorithm)

The Most Misunderstood Concept in Physics

The Most Misunderstood Concept in Physics

🩺 2024 Medical Terminology Made Easy - Part 1

🩺 2024 Medical Terminology Made Easy - Part 1

Pascal Bianchi: A dynamical system viewpoint on stochastic approximation methods

Pascal Bianchi: A dynamical system viewpoint on stochastic approximation methods

But what is the Fourier Transform? A visual introduction.

But what is the Fourier Transform? A visual introduction.

Policy Gradient Theorem Explained - Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning

But what is a neural network? | Deep learning chapter 1

But what is a neural network? | Deep learning chapter 1

The Contextual Bandits Problem: A New, Fast, and Simple Algorithm

The Contextual Bandits Problem: A New, Fast, and Simple Algorithm