Pascal Bianchi: A dynamical system viewpoint on stochastic approximation methods

The celebrated Stochastic Gradient Descent and its recent variants such as ADAM, are particular cases of stochastic approximation methods (see Robbins& Monro, 1951). One way of establishing the convergence of these methods is to prove that the iterates shadow the behavior of an Ordinary Differential Equation (ODE). This is the so-called ODE method, whose principle will be introduced in this talk. As an application, we will establish the (weak) convergence of ADAM to a non-autonomous ODE which will be characterized. The case of non-differentiable functions will be addressed if time permits. This talk was a part of The Workshop on Fundamentals of Machine Learning Over Networks (MLoNs) and the KTH EP3260 Fundamentals of MLoNs. Course website: https://sites.google.com/view/mlons/c... Workshop website: https://sites.google.com/view/mlon201...