How to build agents when the smartest AI isn't smart enough

Nick Larus-Stone is the Head of AI at Benchling, the R&D data platform that life science companies use to store and manage their experiments, samples, instruments, and analysis. Benchling has been around for since 2012. In October 2025, it launched Benchling AI, an intelligence layer with a chat interface, backed by an agent, that helps scientists find data, design experiments, and write reports. Nick came to Benchling through its acquisition of Sphinx Bio, the analysis startup he founded. In this conversation, Nick walks through what it takes to build agents for scientific work, and where the playbook from coding agents holds up and where it breaks down. We also discuss: • Why Benchling invests so heavily in getting clean data upfront • How they cross-check answers between models to get more out of each one • Why and how Benchling leans on production traces • Where AI actually helps science today, and where it still gets stuck • Why understanding LLMs is closer to biology than software engineering Timestamps: 00:00 Intro 01:22 What Benchling AI is, and the 14-year data platform underneath it 04:36 Why a decade of structured data is a core advantage 05:57 The architecture under the hood 08:28 Similarities and differences compared to a coding harness 11:14 Benchling’s multi-agent architectures 14:36 Dealing with verifiable vs non-verifiable tasks 16:19 Doing evals when clean benchmarks aren’t possible 18:13 Context engineering: SQL vs. file-based harnesses 22:11 Memory: agents that create and update their own skills 25:30 What user education for scientists looks like 30:33 Why understanding LLMs is closer to biology than software 33:28 When will agents discover a novel cure for disease? 44:58 The future of harnesses in science 48:13 Why fine-tuning on biology hasn't beaten frontier models References: • Agent Skills (Claude Docs): https://docs.claude.com/en/docs/agent... • Benchling’s Deep Research Agent: https://www.benchling.com/blog/comple... • Claude (Anthropic): https://www.anthropic.com/claude • Design of experiments (DOE): https://en.wikipedia.org/wiki/Design_... • FDA Investigational New Drug (IND) application: https://www.fda.gov/drugs/types-appli... • Gemini (Google): https://gemini.google.com/ • Google AI co-scientist: https://research.google/blog/accelera... • LangSmith: https://www.langchain.com/langsmith • Model Context Protocol (MCP): https://modelcontextprotocol.io/ • The Ralph (Wiggum) Loop (Geoffrey Huntley): https://ghuntley.com/ralph/ • Sphinx Bio: https://www.benchling.com/blog/resync... Where to find Nick: • Benchling: https://www.benchling.com/ • LinkedIn:   / nlarusstone   • Twitter/X: https://x.com/nlarusstone Where to find Harrison: • LinkedIn:   / harrison-chase-961287118   • Twitter/X: https://x.com/hwchase17 Where to find LangChain: • Website: http://langchain.com • Docs: https://docs.langchain.com/ Send feedback or questions to [email protected]