When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)
BERT is a giant model. Turns out you can prune away many of its components and it still works. This paper analyzes BERT pruning in light of the Lottery Ticket Hypothesis and finds that even the "bad" lottery tickets can be fine-tuned to good accuracy. OUTLINE: 0:00 - Overview 1:20 - BERT 3:20 - Lottery Ticket Hypothesis 13:00 - Paper Abstract 18:00 - Pruning BERT 23:00 - Experiments 50:00 - Conclusion https://arxiv.org/abs/2005.00561 ML Street Talk Channel: / @machinelearningstreettalk Abstract: Much of the recent success in NLP is due to the large Transformer-based models such as BERT (Devlin et al, 2019). However, these models have been shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis. For fine-tuned BERT, we show that (a) it is possible to find a subnetwork of elements that achieves performance comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. However, the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful. We also show that the "good" subnetworks vary considerably across GLUE tasks, opening up the possibilities to learn what knowledge BERT actually uses at inference time. Authors: Sai Prasanna, Anna Rogers, Anna Rumshisky Links: YouTube: / yannickilcher Twitter: / ykilcher BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher

Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)

AlphaFold - The Most Useful Thing AI Has Ever Done

The Strange Math That Predicts (Almost) Anything

The Riemann Hypothesis, Explained

Language Models are Open Knowledge Graphs (Paper Explained)

The French Do Not Care About Work

Something is jamming GPS over Europe. Here's what we found

Training Sand to Think: Artificial General Intelligence & Future of Physics

Transformers, the tech behind LLMs | Deep Learning Chapter 5

BLEURT: Learning Robust Metrics for Text Generation (Paper Explained)

How AI Cracked the Protein Folding Code and Won a Nobel Prize
![The Real Reason Huge AI Models Actually Work [Prof. Andrew Wilson]](https://i.ytimg.com/vi/M-jTeBCEGHc/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBhGNRMnPq6KLzUK1NQBFqKWiYhMA)
The Real Reason Huge AI Models Actually Work [Prof. Andrew Wilson]

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained)

You've (Likely) Been Playing The Game of Life Wrong

Synthesizer: Rethinking Self-Attention in Transformer Models (Paper Explained)

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

EI Seminar - Michael Carbin - The Lottery Ticket Hypothesis

"A.I. and Our Economic Future," Professor Chad Jones

Image GPT: Generative Pretraining from Pixels (Paper Explained)

