SLMs - When and When NOT to use them (+ Mistral 3.1 & Gemma-3 Bakeoff)

Datasets & Slides 📝 https://www.oxen.ai/ox/Arxiv-Dive-Smo... https://www.oxen.ai/ox/SmolLMs https://www.oxen.ai/ox/mbrp https://www.oxen.ai/ox/SimpleQA Join Arxiv Dives 🤿 https://oxen.ai/community Discord 🗿 / discord Use Oxen AI 🐂 https://oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. -- Chapters 0:00 Welcome to Arxiv Dive 1:12 $ whois 1:59 $ whoami 3:18 What is Oxen.ai 4:24 Lets Talk Smol Lms4:35 Benefits of Smol Lms 6:33 When Not to Use Smol LMs 7:47 What is a Data Flywheel 9:01 Why Smol LMs Are Important Now 13:42 Did I Use a Framework for SFT or RL 14:09 Only Your Data and Criteria Matters 16:18 Gemma-3 vs. Mistral-3.1 Evals 16:41 How to Evaluate a Model 26:49 o3-mini, Mistral Small-3.1, and Gemma-3 on SimpleQA 28:17 Training a Model to Program in Rust 34:45 o3-mini, Mistral Small-3.1, and Gemma-3’s Eval on Rust 38:17 Questions 43:36 What About Smol Multimodal Models? 48:56 Test a Homemade Phi-4 Multimodal Chatbot 58:45 QR Code for Free Compute Credits

How Phi-4 Cracked Small Multimodality

How Phi-4 Cracked Small Multimodality

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

How Evaluation-Driven Development (EDD) Works – Alejandro Aboy

How Evaluation-Driven Development (EDD) Works – Alejandro Aboy

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

Model Context Protocol (MCP) Explained in 20 Minutes

Model Context Protocol (MCP) Explained in 20 Minutes

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

How to Fine-Tune FLUX-dev and Comparing it to a Fine-Tuned PixArt Model

How to Fine-Tune FLUX-dev and Comparing it to a Fine-Tuned PixArt Model

This is not the AI we were promised | The Royal Society

This is not the AI we were promised | The Royal Society

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin

Don't learn AI Agents without Learning these Fundamentals

Don't learn AI Agents without Learning these Fundamentals

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Ex-Google Insider: You're Not Ready For The Next Phase of AI

Ex-Google Insider: You're Not Ready For The Next Phase of AI

Deep Dive into LLMs like ChatGPT

Deep Dive into LLMs like ChatGPT

The future of intelligence | Demis Hassabis (Co-founder and CEO of DeepMind)

The future of intelligence | Demis Hassabis (Co-founder and CEO of DeepMind)