Top Optimizers for Neural Networks
In this video, I cover 16 of the most popular optimizers used for training neural networks, starting from the basic Gradient Descent (GD), to the most recent ones, such as Adam, AdamW, and Lookahead. #deeplearning #artificialintelligence #neuralnetworks #computerscience ~~~~~~~~~~~~~~~ References Nestrov: https://proceedings.mlr.press/v28/sut... AdaGraD: https://www.jmlr.org/papers/volume12/... AdaDelta: https://arxiv.org/abs/1212.5701 Adam & AdaMax: https://arxiv.org/abs/1412.6980 AMSGrad: https://arxiv.org/abs/1904.03590 AdaBound: https://arxiv.org/abs/1902.09843 AdamW: https://arxiv.org/pdf/1711.05101v2.pdf Yogi: https://proceedings.neurips.cc/paper_... Nadam: https://openreview.net/pdf/OM0jvwB8jI... Lookahead: https://arxiv.org/abs/1907.08610

Deep Learning-All Optimizers In One Video-SGD with Momentum,Adagrad,Adadelta,RMSprop,Adam Optimizers

Who's Adam and What's He Optimizing? | Deep Dive into Optimizers for Machine Learning!

Irene Ong, PhD Grand Rounds 6/25/2026
![The Misconception that Almost Stopped AI [How Models Learn Part 1]](https://i.ytimg.com/vi/NrO20Jb-hy0/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLCiksXndIEYQZVVoTfArQwhou-eWw)
The Misconception that Almost Stopped AI [How Models Learn Part 1]

The Most Important Algorithm in Machine Learning

What Nobody Tells You About Being a Quant

Watching Neural Networks Learn

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Adam Optimizer in 20 min

Instant Focus Mode – 40Hz Gamma Brainwave Music for Deep Focus & Productivity

25. Stochastic Gradient Descent

Self-Attention Using Scaled Dot-Product Approach

Optimization in Deep Learning | All Major Optimizers Explained in Detail
![Why Deep Learning Works Unreasonably Well [How Models Learn Part 3]](https://i.ytimg.com/vi/qx7hirqgfuU/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBuo8VfTixNDA_9nS8hYRCQvGpFtg)
Why Deep Learning Works Unreasonably Well [How Models Learn Part 3]

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

The FASTEST introduction to Reinforcement Learning on the internet

FORMATION DEEP LEARNING COMPLETE (2021)
![Hopfield network: How are memories stored in neural networks? [Nobel Prize in Physics 2024] #SoME2](https://i.ytimg.com/vi/piF6D6CQxUw/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLAtAABHOrIL2_nXwC4GMP3GdjFhqA)
Hopfield network: How are memories stored in neural networks? [Nobel Prize in Physics 2024] #SoME2

