Why Your Coding Agents Need Evals | Assembled's CTO

Stephen Poletto talks with John Wang, co-founder and CTO of Assembled, about what it takes to build with AI inside a high-reliability engineering org. John's team runs production AI agents for companies like DoorDash, Stripe, and Robinhood, so the bar for shipping quality is high. He gets candid on why tokenmaxxing is a dead end, how ‪@assembledhq‬ enables even non-engineers to ship real code without breaking things, the internal agent platform his team built, and why the teams winning with AI optimize for quality at pace instead of raw output. Chapters: 0:00 What is Assembled? 2:07 How customer support changed from pre-LLM to the AI era 6:46 Bringing AI into the engineering build process 9:45 Letting non-engineers ship code: builder vs engineer roles 12:54 Why they pulled out the MCP and gave agents a CLI instead 13:51 The stack behind their internal agent platform 15:54 Team-wide automations and a self-improving AGENTS.md loop 18:10 Why "token maxing is stupid" 22:47 Picking the right model for the job, and the eval trap 24:01 Slop cannons: speed without a quality bar 25:23 Activity vs progress, and hiring for product judgment 28:08 Running evals on your coding agents 30:05 What's next: delightful agent experience 🔗 See Span in action: https://www.span.app/ 🔗 Connect with Span: / getspan 🔗 Connect with Stephen: / spoletto 🔗 Connect with Assembled: / assembledhq

How an 11-Person Team Ships a Feature Every Week | Jean-Denis Greze, Town CEO & Fmr. Plaid CTO

How an 11-Person Team Ships a Feature Every Week | Jean-Denis Greze, Town CEO & Fmr. Plaid CTO

How Clay Builds Software With Agents | Stack Trace

How Clay Builds Software With Agents | Stack Trace

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Interview #88 Bill Franks, President at Analytics Advisory Partners

Interview #88 Bill Franks, President at Analytics Advisory Partners

Software architecture, human judgment, and AI's limits with Grady Booch

Software architecture, human judgment, and AI's limits with Grady Booch

What Stays Human When AI Writes the Code | Stack Trace

What Stays Human When AI Writes the Code | Stack Trace

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Ryan Lopopolo - Harness Engineering: How to Build Software When Humans Steer and Agents Execute

Ryan Lopopolo - Harness Engineering: How to Build Software When Humans Steer and Agents Execute

Rory Sutherland: Why Cost Reduction Isn't A Strategy

Rory Sutherland: Why Cost Reduction Isn't A Strategy

This 30-Day Experiment Changed How AI Coding Is Done at Expel | Stack Trace

This 30-Day Experiment Changed How AI Coding Is Done at Expel | Stack Trace

Head of Claude Code: What happens after coding is solved | Boris Cherny

Head of Claude Code: What happens after coding is solved | Boris Cherny

Why The Best Engineers Are Solving Code Review Bottlenecks

Why The Best Engineers Are Solving Code Review Bottlenecks

What to teach when AI writes the code | Rainer Stropek | TEDxLinz

What to teach when AI writes the code | Rainer Stropek | TEDxLinz

Mitchell Hashimoto’s new way of writing code

Mitchell Hashimoto’s new way of writing code

Inside YC's AI Playbook

Inside YC's AI Playbook

Retired Amazon VP: How Corporate Politics Work And How To Win | Ethan Evans

Retired Amazon VP: How Corporate Politics Work And How To Win | Ethan Evans

Why 'Ship 2x Faster' Is the Wrong Goal | Steve Pereira on AI & Flow

Why 'Ship 2x Faster' Is the Wrong Goal | Steve Pereira on AI & Flow

From $1M to $35M ARR: Fyxer’s Growth Engineering Playbook—PLG Loops, AI, and 1,000 Experiments

From $1M to $35M ARR: Fyxer’s Growth Engineering Playbook—PLG Loops, AI, and 1,000 Experiments