Lecture 7: Explaining Neural Scaling Laws

Presented by: Jaehoon Lee (Google Brain) Abstract: For a large variety of models and datasets, neural network performance has been empirically observed to scale as a power-law with model size and dataset size. We would like to understand why these power laws emerge, and what features of the data and models determine the values of the power-law exponents. Since these exponents determine how quickly performance improves with more data and larger models, they are of great importance when considering whether to scale up existing models. In this talk, we’ll survey some of the well-known power-law scaling behavior observed in deep neural networks. Drawing intuition from statistical physics, we observe that a simplifying limit arises as one scales up deep learning models. We’ll talk about a theoretical framework that explains and connects various scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes.