Gradient Descent vs Newtons Method: Which ML Optimization To Use and Why

How do you train a model with millions of parameters? We decode the fundamental trade-off between Gradient Descent and Newton’s Method. Understand why deep learning relies on inexpensive first-order gradients (and why the incredible quadratic speed of second-order Newton methods becomes computationally impossible in high dimensions. References: 1. Deep Learning por Ian Goodfellow, Yoshua Bengio, y Aaron Courville 2. Numerical Optimization por Jorge Nocedal y Stephen J. Wright 3. Scientific Computing con MATLAB y Octave por Alfio Quarteroni, Fausto Saleri y Paola Gervasio