L35: Nesterov accelarated gradient descent

Welcome to Lecture 35 of the course "Deep Learning" by Prof. Mitesh M.Khapra Full Course: https://study.iitm.ac.in/ds/course_pa... Video Overview Nesterov Accelerated Gradient Descent (NAG) is an optimization technique that enhances the standard momentum method by incorporating a lookahead mechanism. Introduced by Yurii Nesterov in 1983, NAG anticipates the future gradient direction, allowing the algorithm to make more informed updates and accelerate convergence. This approach reduces oscillations and overshooting, leading to more stable and efficient training in machine learning models. In NAG, the update rule involves computing the gradient at the estimated future position, which combines the current position and the momentum term. This foresight enables the algorithm to adjust its trajectory more effectively, especially in regions with steep gradients. As a result, NAG often outperforms standard gradient descent and momentum methods, particularly in convex optimization problems." About IIT Madras' online Bachelor of Science programme IIT Madras offers four-year BS programmes that aim to provide quality education to all, irrespective of age, educational background, or location. The BS programme has multiple levels, which provide flexibility to students to exit at any of these levels. Depending on the courses completed and credits earned, the learner can receive a Foundation Certificate from IITM CODE (Centre for Outreach and Digital Education), Diploma(s) from IIT Madras, or BSc/BS Degrees from IIT Madras. For more details, Visit: https://www.iitm.ac.in/academics/stud... #NesterovAcceleratedGradient #GradientDescent #Optimization #MachineLearning #DeepLearning #ConvexOptimization #MomentumMethod #AIAlgorithms #NeuralNetworks

L36: Stochastic vs batch gradient

L36: Stochastic vs batch gradient

L32: Momneum based gradient descent

L32: Momneum based gradient descent

Gimbal Lock Explained: How Quaternions Fix Robot Motion Forever

Gimbal Lock Explained: How Quaternions Fix Robot Motion Forever

L17: Learning parameters:gradient descent | taylor series, gradients & optimization

L17: Learning parameters:gradient descent | taylor series, gradients & optimization

L33: Gradient descent with adaptive learning rate in neural networks

L33: Gradient descent with adaptive learning rate in neural networks

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

The French Do Not Care About Work

The French Do Not Care About Work

L34: Scheduling learning rate using decay & line search

L34: Scheduling learning rate using decay & line search

The Anti Trampoline Effect

The Anti Trampoline Effect

Stop Prompting Claude. Use Karpathy's Method Instead.

Stop Prompting Claude. Use Karpathy's Method Instead.

The Fisher Information

The Fisher Information

L15: Gradient descent | weight update rule | learning rate & convergence demo

L15: Gradient descent | weight update rule | learning rate & convergence demo

L26: Gradient w.r.t output units with softmax & loss derivatives

L26: Gradient w.r.t output units with softmax & loss derivatives

Japan – Schweden Highlights | Gruppe F, FIFA WM 2026 | sportstudio

Japan – Schweden Highlights | Gruppe F, FIFA WM 2026 | sportstudio

Young Men in Expensive Cars

Young Men in Expensive Cars

L12: Proof of convergence: perceptron learning algorithm

L12: Proof of convergence: perceptron learning algorithm

Google DeepMind Distinguished Eng (L9): How To Land a Job at a Frontier Lab | Vlad Feinberg

Google DeepMind Distinguished Eng (L9): How To Land a Job at a Frontier Lab | Vlad Feinberg