Why Your Coding Agents Need Evals | Assembled's CTO

Stephen Poletto talks with John Wang, co-founder and CTO of Assembled, about what it takes to build with AI inside a high-reliability engineering org. John's team runs production AI agents for companies like DoorDash, Stripe, and Robinhood, so the bar for shipping quality is high. He gets candid on why tokenmaxxing is a dead end, how ‪@assembledhq‬ enables even non-engineers to ship real code without breaking things, the internal agent platform his team built, and why the teams winning with AI optimize for quality at pace instead of raw output. Chapters: 0:00 What is Assembled? 2:07 How customer support changed from pre-LLM to the AI era 6:46 Bringing AI into the engineering build process 9:45 Letting non-engineers ship code: builder vs engineer roles 12:54 Why they pulled out the MCP and gave agents a CLI instead 13:51 The stack behind their internal agent platform 15:54 Team-wide automations and a self-improving AGENTS.md loop 18:10 Why "token maxing is stupid" 22:47 Picking the right model for the job, and the eval trap 24:01 Slop cannons: speed without a quality bar 25:23 Activity vs progress, and hiring for product judgment 28:08 Running evals on your coding agents 30:05 What's next: delightful agent experience 🔗 See Span in action: https://www.span.app/ 🔗 Connect with Span:   / getspan   🔗 Connect with Stephen:   / spoletto   🔗 Connect with Assembled:   / assembledhq