Top Optimizers for Neural Networks

In this video, I cover 16 of the most popular optimizers used for training neural networks, starting from the basic Gradient Descent (GD), to the most recent ones, such as Adam, AdamW, and Lookahead. #deeplearning #artificialintelligence #neuralnetworks #computerscience ~~~~~~~~~~~~~~~ References Nestrov: https://proceedings.mlr.press/v28/sut... AdaGraD: https://www.jmlr.org/papers/volume12/... AdaDelta: https://arxiv.org/abs/1212.5701 Adam & AdaMax: https://arxiv.org/abs/1412.6980 AMSGrad: https://arxiv.org/abs/1904.03590 AdaBound: https://arxiv.org/abs/1902.09843 AdamW: https://arxiv.org/pdf/1711.05101v2.pdf Yogi: https://proceedings.neurips.cc/paper_... Nadam: https://openreview.net/pdf/OM0jvwB8jI... Lookahead: https://arxiv.org/abs/1907.08610