Adagrad Algorithm Explained and Implemented from Scratch in Python

👨‍💻 to get started with AI engineering, check out this Scrimba course: https://scrimba.com/the-ai-engineer-p... Adagrad is an often used extension of stochastic gradient descent that work well for sparse parameter space like text or images. In this video I'll explain and show you how to implement it! Credit to : Max Olson for the picture in the thumbnail, sorry I to have cut the watermark in the picture. The faint background music is from Youtube Music! Github: https://github.com/yacineMahdid/artif... The implementation is very straighforward once the cumulative sum of gradient is understood as it is an extension of the stochastic gradient descent. You can check out the jupyter notebook with the code over here: https://github.com/yacineMahdid/artif... Also if you want a text tutorial you can check out this one, it's very good: https://ruder.io/optimizing-gradient-... Here is a definition of adagrad from wikipedia: "AdaGrad (for adaptive gradient algorithm) is a modified stochastic gradient descent algorithm with per-parameter learning rate, first published in 2011. Informally, this increases the learning rate for sparser parameters and decreases the learning rate for ones that are less sparse. This strategy often improves convergence performance over standard stochastic gradient descent in settings where data is sparse and sparse parameters are more informative. Examples of such applications include natural language processing and image recognition. It still has a base learning rate η, but this is multiplied with the elements of a vector {Gj,j} which is the diagonal of the outer product matrix." ---- Join the Discord for general discussion:   / discord   ---- Follow Me Online Here: Twitter:   / codethiscodeth1   GitHub: https://github.com/yacineMahdid LinkedIn:   / yacine-mahdid-809425163   Instagram:   / yacine_mahdid   ___ Have a great week! 👋