1W-MINDS: Yi Ma, April 1, 2021, Deep Networks from First Principles

In this talk, we offer an entirely “white box’’ interpretation of deep (convolution) networks from the perspective of data compression (and group invariance). In particular, we show how modern deep layered architectures, linear (convolution) operators and nonlinear activations, and even all parameters can be derived from the principle of maximizing rate reduction (with group invariance). All layers, operators, and parameters of the network are explicitly constructed via forward propagation, instead of learned via back propagation. All components of so-obtained network, called ReduNet, have precise optimization, geometric, and statistical interpretation. There are also several nice surprises from this principled approach: it reveals a fundamental tradeoff between invariance and sparsity for class separability; it reveals a fundamental connection between deep networks and Fourier transform for group invariance – the computational advantage in the spectral domain (why spiking neurons?); this approach also clarifies the mathematical role of forward propagation (optimization) and backward propagation (variation). In particular, the so-obtained ReduNet is amenable to fine-tuning via both forward and backward (stochastic) propagation, both for optimizing the same objective. This is joint work with students Yaodong Yu, Ryan Chan, Haozhi Qi of Berkeley, Dr. Chong You now at Google Research, and Professor John Wright of Columbia University.

1W-MINDS, Jan 8: Stephen Becker (University of Colorado Boulder), Randomization methods for big-data
▶︎

1W-MINDS, Jan 8: Stephen Becker (University of Colorado Boulder), Randomization methods for big-data

Yi Ma: Pursuing the Nature of Intelligence @ ICLR
▶︎

Yi Ma: Pursuing the Nature of Intelligence @ ICLR

LSTM is dead. Long Live Transformers!
▶︎

LSTM is dead. Long Live Transformers!

1W-MINDS, Feb. 5:  Jonas Latz (University of Manchester), Losing momentum in continuous-time...
▶︎

1W-MINDS, Feb. 5: Jonas Latz (University of Manchester), Losing momentum in continuous-time...

Yi Ma | Deep Networks from First Principles
▶︎

Yi Ma | Deep Networks from First Principles

Leonard Susskind: String Theory, Fine-Tuning, and the Physics of the Multiverse
▶︎

Leonard Susskind: String Theory, Fine-Tuning, and the Physics of the Multiverse

The Riemann Hypothesis, Explained
▶︎

The Riemann Hypothesis, Explained

AlphaFold - The Most Useful Thing AI Has Ever Done
▶︎

AlphaFold - The Most Useful Thing AI Has Ever Done

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source
▶︎

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

1W-MINDS, Jan 15: Nicholas Dexter (Florida State University), Recent progress on sparse approx...
▶︎

1W-MINDS, Jan 15: Nicholas Dexter (Florida State University), Recent progress on sparse approx...

The Mathematical Foundations of Intelligence [Professor Yi Ma]
▶︎

The Mathematical Foundations of Intelligence [Professor Yi Ma]

The Passage of Time and the Meaning of Life | Sean Carroll
▶︎

The Passage of Time and the Meaning of Life | Sean Carroll

Scott Aaronson - The TRUTH About Quantum Computing
▶︎

Scott Aaronson - The TRUTH About Quantum Computing

How to Learn Python | Python Programming | Learn Python | Intellipaat
▶︎

How to Learn Python | Python Programming | Learn Python | Intellipaat

AI Is Creating A Rare Opportunity For Investors. How Jim Roppel Is Playing It. | Investing With IBD
▶︎

AI Is Creating A Rare Opportunity For Investors. How Jim Roppel Is Playing It. | Investing With IBD

1W-MINDS, April 9:  Flavio du Pin Calmon (Harvard University), Inference-Time Information Theory
▶︎

1W-MINDS, April 9: Flavio du Pin Calmon (Harvard University), Inference-Time Information Theory

Something is jamming GPS over Europe. Here's what we found
▶︎

Something is jamming GPS over Europe. Here's what we found

1W-MINDS, Dec. 4:  Minxin Zhang (UCLA), Structure-Aware Adaptive Nonconvex Optimization for Deep...
▶︎

1W-MINDS, Dec. 4: Minxin Zhang (UCLA), Structure-Aware Adaptive Nonconvex Optimization for Deep...

Gradient descent, how neural networks learn | Deep Learning Chapter 2
▶︎

Gradient descent, how neural networks learn | Deep Learning Chapter 2

Brian Greene and Leonard Susskind: Quantum Mechanics, Black Holes and String Theory
▶︎

Brian Greene and Leonard Susskind: Quantum Mechanics, Black Holes and String Theory