Upcycling LLMs into MoE with Nvidia Researcher, Ethan He

Nvidia researcher, Ethan He, joins the Oxen Herd to give a deep dive into his co-authored paper, Upcycling Large Language Models into Mixture of Experts. -- Use Oxen AI 🐂 https://oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. -- Paper 📜 https://arxiv.org/abs/2410.07524 Links + Notes 📝 https://www.oxen.ai/blog/how-upcyclin... Join Arxiv Dives 🤿 https://oxen.ai/community Discord 🗿 / discord -- Chapters 0:00 Who is Ethan He 2:16 Ethan He Presents Upcycling LLMs 2:30 What is MoE 5:27 How Does the MoE Layer Work 8:52 How the Router Works 13:32 The auxiliary loss: Switch Transformers 14:55 Mixtral vs. Switch Transformer 18:50 The Takeaway 23:38 Plain Upcycling 28:44 Weight Scaling 32:55 Fine-Grained MoE 37:28 Fine-Grained MoE Upcycling 43:01 Experiments 43:50 The Importance of Learning Rate 45:48 Analysis of the Wave Similarity 49:43 Number of Experts 50:14 Large Scale Upcycling 55:10 Questions

How Meta's Thinking LLMs Work

How Meta's Thinking LLMs Work

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Intro to ML - Unit 7 Lecture - Modeling - Unsupervised Learning Part 1 - Summer 2026

Intro to ML - Unit 7 Lecture - Modeling - Unsupervised Learning Part 1 - Summer 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Q&A Data Provenance for Generative Artificial Intelligence

Q&A Data Provenance for Generative Artificial Intelligence

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

The New Face of America: Inside the Second Great Depression

The New Face of America: Inside the Second Great Depression

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

AI, Machine Learning, Deep Learning and Generative AI Explained

AI, Machine Learning, Deep Learning and Generative AI Explained

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

LLMs Don't Need More Parameters. They Need Loops.

LLMs Don't Need More Parameters. They Need Loops.

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

NVIDIA CEO Jensen Huang's Vision for the Future

NVIDIA CEO Jensen Huang's Vision for the Future

The Hardest Problem AI Ever Solved, with Google DeepMind CEO

The Hardest Problem AI Ever Solved, with Google DeepMind CEO

The Power of a Single Neuron and a Path to Simulating the Brain | Dr. Konrad Kording

The Power of a Single Neuron and a Path to Simulating the Brain | Dr. Konrad Kording

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

Ex-Google Insider: You're Not Ready For The Next Phase of AI

Ex-Google Insider: You're Not Ready For The Next Phase of AI

Something is jamming GPS over Europe. Here's what we found

Something is jamming GPS over Europe. Here's what we found