Trinity: Training a 400B MoE from Scratch Without Losing Your Mind

[2026 - Day 3 - Model Systems] Training sparse Mixture-of-Experts models at scale is notoriously unstable. Experts collapse, routers drift, and loss spikes appear out of nowhere. This talk covers how we built Trinity Large, a 400B parameter MoE (13B active), trained on 17 trillion tokens with zero loss spikes. We'll walk through the decisions that actually mattered: why we replaced standard aux-loss-free balancing with a momentum-based approach (SMEBU), how interleaved local/global attention made context extension surprisingly smooth, and what broke when we first tried running Muon at scale. I'll also cover the less glamorous stuff: our Random Sequential Document Buffer to reduce batch heterogeneity, recovering from B300 GPU faults on brand-new hardware, and the six changes we shipped at once when routing started collapsing mid-run. Practical lessons for teams training their own MoEs or scaling up sparse architectures SPEAKER: Lucas Atkins - CTO, Arcee AI 👉 Sign up for our "No BS" Newsletter to get the latest technical data & AI content: https://aicouncil.com/newsletter ABOUT AI COUNCIL: AI Council brings together the brightest minds in data to share industry knowledge, technical architectures and best practices in building cutting edge data & AI systems and tools. FIND US: Website: https://aicouncil.com/ LinkedIn: / aicouncilconf X: https://x.com/aicouncilconf

Lessons From RL Systems That Looked Fine Until They Didn't

Lessons From RL Systems That Looked Fine Until They Didn't

Chip design from the bottom up – Reiner Pope

Chip design from the bottom up – Reiner Pope

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

Co-Creator of Haskell: Useless vs Useful Languages, Rust vs C, Functional Programming | Simon Jones

Co-Creator of Haskell: Useless vs Useful Languages, Rust vs C, Functional Programming | Simon Jones

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

HW News - DRAM Companies Hit Trillions of Dollars, Bambu Open Source, NVIDIA Spark Concerns

HW News - DRAM Companies Hit Trillions of Dollars, Bambu Open Source, NVIDIA Spark Concerns

Powering Agents with Context Graphs & Ontologies

Powering Agents with Context Graphs & Ontologies

Microsoft Just Released Their Own Linux Distro: Should You Be Worried?

Microsoft Just Released Their Own Linux Distro: Should You Be Worried?

If Prime Numbers Become Increasingly Rare, Then Why Do They Keep Showing Up In Pairs?

If Prime Numbers Become Increasingly Rare, Then Why Do They Keep Showing Up In Pairs?

RLVR in Practice: From Synthetic Data to GRPO

RLVR in Practice: From Synthetic Data to GRPO

Warum die Sperre von Claude Fable vorhersehbar war

Warum die Sperre von Claude Fable vorhersehbar war

The Best Local Agentic Coding Workflow (Complete Guide)

The Best Local Agentic Coding Workflow (Complete Guide)

How This Non-Technical Founder Mastered Agentic Engineering in 50 Minutes | Matt Van Horn

How This Non-Technical Founder Mastered Agentic Engineering in 50 Minutes | Matt Van Horn

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

The Open Source community is collapsing

The Open Source community is collapsing

How To Think SO CLEARLY People Assume You're A Genius

How To Think SO CLEARLY People Assume You're A Genius

AI Lies Are Finally Getting Punished

AI Lies Are Finally Getting Punished

China's 1.4nm Breakthrough Terrifies America and Taiwan

China's 1.4nm Breakthrough Terrifies America and Taiwan

Keynote: Linus Torvalds, Creator of Linux & Git with Dirk Hohndel, Founder, DH Consulting

Keynote: Linus Torvalds, Creator of Linux & Git with Dirk Hohndel, Founder, DH Consulting