Neural Networks Are Elastic Origami! [Prof. Randall Balestriero]

Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge... TOC: [00:00:00] Introduction 1. Neural Network Geometry and Spline Theory [00:01:41] 1.1 Neural Network Geometry and Spline Theory [00:07:41] 1.2 Deep Networks Always Grok [00:11:39] 1.3 Grokking and Adversarial Robustness [00:16:09] 1.4 Double Descent and Catastrophic Forgetting 2. Reconstruction Learning [00:18:49] 2.1 Reconstruction Learning [00:24:15] 2.2 Frequency Bias in Neural Networks 3. Geometric Analysis of Neural Networks [00:29:02] 3.1 Geometric Analysis of Neural Networks [00:34:41] 3.2 Adversarial Examples and Region Concentration 4. LLM Safety and Geometric Analysis [00:40:05] 4.1 LLM Safety and Geometric Analysis [00:46:11] 4.2 Toxicity Detection in LLMs [00:52:24] 4.3 Intrinsic Dimensionality and Model Control [00:58:07] 4.4 RLHF and High-Dimensional Spaces 5. Conclusion [01:02:13] 5.1 Neural Tangent Kernel [01:08:07] 5.2 Conclusion REFS: [00:01:35] Balestriero/Humayun – Deep network geometry & input space partitioning https://arxiv.org/html/2408.04809v1 [00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operators https://proceedings.mlr.press/v80/bal... [00:13:55] Song et al. – Gradient-based white-box adversarial attacks https://arxiv.org/abs/2012.14965 [00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustness https://arxiv.org/abs/2402.15555 [00:18:25] Humayun – Training dynamics & double descent via linear region evolution https://arxiv.org/abs/2310.12977 [00:20:15] Balestriero – Power diagram partitions in DNN decision boundaries https://arxiv.org/abs/1905.08443 [00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruning https://arxiv.org/abs/1803.03635 [00:24:00] Belkin et al. – Double descent phenomenon in modern ML https://arxiv.org/abs/1812.11118 [00:25:55] Balestriero et al. – Batch normalization’s regularization effects https://arxiv.org/pdf/2209.14778 [00:29:35] EU – EU AI Act 2024 with compute restrictions https://www.lw.com/admin/upload/SiteA... [00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometry https://openaccess.thecvf.com/content... [00:40:40] Carlini – Trade-offs between adversarial robustness and accuracy https://arxiv.org/abs/1902.06705 [00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methods https://raw.githubusercontent.com/mlr... [00:47:20] Balestriero & LeCun – Spectral analysis of neural network learning https://proceedings.neurips.cc/paper_... [00:49:45] He et al. – MAE: Masked Autoencoders for self-supervised learning https://arxiv.org/abs/2111.06377 [00:54:50] Balestriero et al. – Geometric analysis of LLM layers for toxicity detection https://arxiv.org/abs/2309.12312 [00:59:35] Balestriero et al. – Superior toxicity detection via geometric features https://arxiv.org/html/2312.01648v2 [01:04:45] UofT ML – Self-attention control & context length effects https://arxiv.org/abs/2310.04444 [01:11:55] Roberts – Foundations of deep learning theory https://arxiv.org/abs/2106.10165 [01:15:40] Balestriero & Cha – Kolmogorov GAM Networks via spline partition theory https://arxiv.org/pdf/2501.00704 [01:16:40] Various – Graph Kolmogorov-Arnold Networks (GKAN) extension https://www.nature.com/articles/s4159...

This is why Deep Learning is really weird.

This is why Deep Learning is really weird.

Yann LeCun's $1B Bet Against LLMs

Yann LeCun's $1B Bet Against LLMs

AutoGrad Changed Everything (Not Transformers) [Dr. Jeff Beck]

AutoGrad Changed Everything (Not Transformers) [Dr. Jeff Beck]

The Brain’s Learning Algorithm Isn’t Backpropagation

The Brain’s Learning Algorithm Isn’t Backpropagation

JTA Labs: The Math behind Modern Machine Learning

JTA Labs: The Math behind Modern Machine Learning

Something is jamming GPS over Europe. Here's what we found

Something is jamming GPS over Europe. Here's what we found

Attention in transformers, step-by-step | Deep Learning Chapter 6

Attention in transformers, step-by-step | Deep Learning Chapter 6

Brian Greene and Leonard Susskind: Quantum Mechanics, Black Holes and String Theory

Brian Greene and Leonard Susskind: Quantum Mechanics, Black Holes and String Theory

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

The Mathematical Foundations of Intelligence [Professor Yi Ma]

The Mathematical Foundations of Intelligence [Professor Yi Ma]

Why Tech CEOs Are Quietly Cancelling Their AI Plans

Why Tech CEOs Are Quietly Cancelling Their AI Plans

Nvidia CEO Jensen Huang Interview| Bloomberg Technology Special

Nvidia CEO Jensen Huang Interview| Bloomberg Technology Special

Einschlafmeditation: Ich bin für dich da | Wenn du nicht alleine einschlafen willst

Einschlafmeditation: Ich bin für dich da | Wenn du nicht alleine einschlafen willst

AI Bubble: How AI's push towards IPOs became a death drive | Ed Zitron

AI Bubble: How AI's push towards IPOs became a death drive | Ed Zitron

NEURAL NETWORKS ARE WEIRD! - Neel Nanda (DeepMind)

NEURAL NETWORKS ARE WEIRD! - Neel Nanda (DeepMind)

François Chollet on OpenAI o-models and ARC

François Chollet on OpenAI o-models and ARC

Mathematics: The rise of the machines - Yang-Hui He

Mathematics: The rise of the machines - Yang-Hui He

The Man Who Invented Modern AI (Before Everyone Else) — Jürgen Schmidhuber

The Man Who Invented Modern AI (Before Everyone Else) — Jürgen Schmidhuber

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

The Misconception that Almost Stopped AI [How Models Learn Part 1]

The Misconception that Almost Stopped AI [How Models Learn Part 1]