Top Optimizers for Neural Networks

In this video, I cover 16 of the most popular optimizers used for training neural networks, starting from the basic Gradient Descent (GD), to the most recent ones, such as Adam, AdamW, and Lookahead. #deeplearning #artificialintelligence #neuralnetworks #computerscience ~~~~~~~~~~~~~~~ References Nestrov: https://proceedings.mlr.press/v28/sut... AdaGraD: https://www.jmlr.org/papers/volume12/... AdaDelta: https://arxiv.org/abs/1212.5701 Adam & AdaMax: https://arxiv.org/abs/1412.6980 AMSGrad: https://arxiv.org/abs/1904.03590 AdaBound: https://arxiv.org/abs/1902.09843 AdamW: https://arxiv.org/pdf/1711.05101v2.pdf Yogi: https://proceedings.neurips.cc/paper_... Nadam: https://openreview.net/pdf/OM0jvwB8jI... Lookahead: https://arxiv.org/abs/1907.08610

Deep Learning-All Optimizers In One Video-SGD with Momentum,Adagrad,Adadelta,RMSprop,Adam Optimizers

Deep Learning-All Optimizers In One Video-SGD with Momentum,Adagrad,Adadelta,RMSprop,Adam Optimizers

Who's Adam and What's He Optimizing? | Deep Dive into Optimizers for Machine Learning!

Who's Adam and What's He Optimizing? | Deep Dive into Optimizers for Machine Learning!

Irene Ong, PhD Grand Rounds 6/25/2026

Irene Ong, PhD Grand Rounds 6/25/2026

The Misconception that Almost Stopped AI [How Models Learn Part 1]

The Misconception that Almost Stopped AI [How Models Learn Part 1]

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

What Nobody Tells You About Being a Quant

What Nobody Tells You About Being a Quant

Watching Neural Networks Learn

Watching Neural Networks Learn

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Adam Optimizer in 20 min

Adam Optimizer in 20 min

Instant Focus Mode – 40Hz Gamma Brainwave Music for Deep Focus & Productivity

Instant Focus Mode – 40Hz Gamma Brainwave Music for Deep Focus & Productivity

25. Stochastic Gradient Descent

25. Stochastic Gradient Descent

Self-Attention Using Scaled Dot-Product Approach

Self-Attention Using Scaled Dot-Product Approach

Optimization in Deep Learning | All Major Optimizers Explained in Detail

Optimization in Deep Learning | All Major Optimizers Explained in Detail

Why Deep Learning Works Unreasonably Well [How Models Learn Part 3]

Why Deep Learning Works Unreasonably Well [How Models Learn Part 3]

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

The FASTEST introduction to Reinforcement Learning on the internet

The FASTEST introduction to Reinforcement Learning on the internet

FORMATION DEEP LEARNING COMPLETE (2021)

FORMATION DEEP LEARNING COMPLETE (2021)

Hopfield network: How are memories stored in neural networks? [Nobel Prize in Physics 2024] #SoME2

Hopfield network: How are memories stored in neural networks? [Nobel Prize in Physics 2024] #SoME2

A Review of 10 Most Popular Activation Functions in Neural Networks

A Review of 10 Most Popular Activation Functions in Neural Networks