Lecture 7: Explaining Neural Scaling Laws
Presented by: Jaehoon Lee (Google Brain) Abstract: For a large variety of models and datasets, neural network performance has been empirically observed to scale as a power-law with model size and dataset size. We would like to understand why these power laws emerge, and what features of the data and models determine the values of the power-law exponents. Since these exponents determine how quickly performance improves with more data and larger models, they are of great importance when considering whether to scale up existing models. In this talk, we’ll survey some of the well-known power-law scaling behavior observed in deep neural networks. Drawing intuition from statistical physics, we observe that a simplifying limit arises as one scales up deep learning models. We’ll talk about a theoretical framework that explains and connects various scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes.

Introductory lecture 8: Introduction to Information Theory (Portuguese)

Blake Bordelon | Infinite Limits and Scaling Laws for Deep Neural Networks

Explaining Neural Scaling Laws

Meet Tech's Dealmakers Backing The Next Big Thing | BI Live

LLMs | Scaling Laws | Lec 11

432Hz + 528Hz - Fall Into Deep Sleep in 5 Minutes, Whole Body Regeneration, Remove Insomnia

Free Event: Power BI Beginner to Pro 2026 Edition - Full Hands-On Tutorial

Scaling Data-Constrained Language Models

The Strange Math That Predicts (Almost) Anything

Intuition behind Mamba and State Space Models | Enhancing LLMs!

Train Your Brain to Never Forget (5 Feynman Habits)

The Uncomfortable Truth About AI “Reasoning” | World Science Festival

1: Introduction to Neural Networks and Deep Learning; Training Deep NNs

Scaling laws are explained by memorization and not intelligence – Francois Chollet

What is SonarQube | Introduction SonarQube | SonarQube Tutorial | SonarQube Basics | Intellipaat

AI can't cross this line and we don't know why.

All Machine Learning Models Clearly Explained!

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 9: Scaling laws 1

MIT 6.S191 (2025): Recurrent Neural Networks, Transformers, and Attention

