BFGS & L-BFGS: The Algorithms Behind Modern Machine Learning | Machine Learning | Optimisation
In this video, we dive deep into BFGS and L-BFGS - the quasi-Newton optimization algorithms that power most modern machine learning libraries. If you've ever wondered why PyTorch and SciPy use L-BFGS instead of pure gradient descent or Newton's method, this is your answer. WHAT YOU'LL LEARN: Why gradient descent fails for ill-conditioned problems How second-order methods use curvature information The computational bottleneck of Newton's method (O(n³) complexity!) How BFGS approximates the inverse Hessian without computing it Why L-BFGS only needs O(mn) memory instead of O(n²) The elegant two-loop recursion algorithm When to use GD vs BFGS vs L-BFGS in practice TECHNICAL DETAILS: BFGS (Broyden-Fletcher-Goldfarb-Shanno) builds an approximation of the inverse Hessian matrix iteratively using only gradient information. This gives you superlinear convergence without the O(n³) cost of computing and inverting the full Hessian. L-BFGS extends this by storing only the last m vector pairs (typically m=3-20), reducing memory from O(n²) to O(mn). For a neural network with 1 million parameters, this is the difference between 8 terabytes and 80 megabytes. The two-loop recursion is an elegant algorithm that computes the search direction in O(mn) time using only stored history - no matrix storage required. KEY CONCEPTS COVERED: Ill-conditioned optimization problems Hessian matrix and curvature Quasi-Newton methods Secant equation and rank-two updates Limited-memory approximations Computational complexity analysis WHERE YOU'LL SEE THESE ALGORITHMS: scipy.optimize.minimize(method='L-BFGS-B') PyTorch: torch.optim.LBFGS TensorFlow's L-BFGS optimizer scikit-learn's LogisticRegression (solver='lbfgs') Maximum likelihood estimation in statistics Parameter fitting in scientific computing RELATED TOPICS: If you enjoyed this video, check out my series on optimization algorithms: Part 1: Gradient Descent Deep Dive Part 2: Newton's Method and Second-Order Optimization Part 3: BFGS and L-BFGS (this video) Part 4: Conjugate Gradient Methods (coming soon) WHO IS THIS FOR? This video is designed for: Machine learning engineers wondering what's under the hood PhD students in computational science Anyone who's seen "method='LBFGS'" and thought "what is that?" Optimization enthusiasts who want to understand modern algorithms You should have basic calculus knowledge (gradients, derivatives) and some familiarity with optimization. If you know what gradient descent is, you're ready for this video. PERFORMANCE COMPARISON: For a typical ML problem with n=100,000 parameters: Gradient Descent: ~50,000 iterations, O(n) memory BFGS: ~500 iterations, O(n²) memory = 80GB (impractical!) L-BFGS: ~500 iterations, O(mn) memory = 40MB (perfect!) WHY THIS MATTERS: L-BFGS is the secret weapon of computational science. When you fit a logistic regression model in scikit-learn, train certain neural networks, or run maximum likelihood estimation, there's a good chance L-BFGS is doing the heavy lifting. Understanding how it works makes you a better ML engineer and computational scientist. CONNECT WITH ME: 🔗 GitHub: https://github.com/psychedelic2007 🔗 LinkedIn: / satyam-sangeet-a8a604119 🔗 Twitter: https://x.com/satyam_san20 💬 Personal Page: https://psychedelic2007.github.io/ TOOLS: Animations created with Manim Community Edition Simulations in Python (NumPy, Matplotlib, SciPy) If you found this helpful, please like, subscribe, and share with anyone learning optimization or ML! Questions? Drop them in the comments - I read and respond to every one. #MachineLearning #Optimization #BFGS #LBFGS #DeepLearning #ComputationalScience #Algorithm #Python #DataScience #AI #NumericalMethods #QuasiNewton #GradientDescent #PyTorch #TensorFlow #ScikitLearn

Visually Explained: Newton's Method in Optimization

Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)

Numerical Methods | Gauss Seidel Iteration | VTU Model Question Paper

AlphaFold - The Most Useful Thing AI Has Ever Done

Understanding scipy.minimize part 1: The BFGS algorithm

Lecture 8: Quasi Newton method, BFGS and DFP method

The Strange Math That Predicts (Almost) Anything

Ich habe 100 Tage ARK Valguero gespielt und das ist passiert...

God Says:"I JUST CONFIRMED — ONLY YOU CAN SEE THIS LETTER"/God Message Now/God Message

Birds Singing in a Tranquil Forest 🌳 Nature Sounds for Deep Sleep and Calm Mind

Machine Learning Optimization Algorithms

ASMR Addictive Fast Tapping Collection For Deep Sleep & Anxiety Relief (No Talking) — 2.5 Hours

BFGS, LBFGS & Other Advanced Optimization

After My Wife Passed Away, My Daughter-in-Law Smiled At The Inheritance Meeting!! | Calm Dad Stories

What's NEW at✨SAM'S CLUB✨ + June 2026 INSTANT SAVING!!

8.1 Quasi Newton Methods Part I

How To Think SO CLEARLY People Assume You're A Genius

I Think They Are Lying To You

Tensors are TOO intuitive

