How to build agents when the smartest AI isn't smart enough

Nick Larus-Stone is the Head of AI at Benchling, the R&D data platform that life science companies use to store and manage their experiments, samples, instruments, and analysis. Benchling has been around for since 2012. In October 2025, it launched Benchling AI, an intelligence layer with a chat interface, backed by an agent, that helps scientists find data, design experiments, and write reports. Nick came to Benchling through its acquisition of Sphinx Bio, the analysis startup he founded. In this conversation, Nick walks through what it takes to build agents for scientific work, and where the playbook from coding agents holds up and where it breaks down. We also discuss: • Why Benchling invests so heavily in getting clean data upfront • How they cross-check answers between models to get more out of each one • Why and how Benchling leans on production traces • Where AI actually helps science today, and where it still gets stuck • Why understanding LLMs is closer to biology than software engineering Timestamps: 00:00 Intro 01:22 What Benchling AI is, and the 14-year data platform underneath it 04:36 Why a decade of structured data is a core advantage 05:57 The architecture under the hood 08:28 Similarities and differences compared to a coding harness 11:14 Benchling’s multi-agent architectures 14:36 Dealing with verifiable vs non-verifiable tasks 16:19 Doing evals when clean benchmarks aren’t possible 18:13 Context engineering: SQL vs. file-based harnesses 22:11 Memory: agents that create and update their own skills 25:30 What user education for scientists looks like 30:33 Why understanding LLMs is closer to biology than software 33:28 When will agents discover a novel cure for disease? 44:58 The future of harnesses in science 48:13 Why fine-tuning on biology hasn't beaten frontier models References: • Agent Skills (Claude Docs): https://docs.claude.com/en/docs/agent... • Benchling’s Deep Research Agent: https://www.benchling.com/blog/comple... • Claude (Anthropic): https://www.anthropic.com/claude • Design of experiments (DOE): https://en.wikipedia.org/wiki/Design_... • FDA Investigational New Drug (IND) application: https://www.fda.gov/drugs/types-appli... • Gemini (Google): https://gemini.google.com/ • Google AI co-scientist: https://research.google/blog/accelera... • LangSmith: https://www.langchain.com/langsmith • Model Context Protocol (MCP): https://modelcontextprotocol.io/ • The Ralph (Wiggum) Loop (Geoffrey Huntley): https://ghuntley.com/ralph/ • Sphinx Bio: https://www.benchling.com/blog/resync... Where to find Nick: • Benchling: https://www.benchling.com/ • LinkedIn: / nlarusstone • Twitter/X: https://x.com/nlarusstone Where to find Harrison: • LinkedIn: / harrison-chase-961287118 • Twitter/X: https://x.com/hwchase17 Where to find LangChain: • Website: http://langchain.com • Docs: https://docs.langchain.com/ Send feedback or questions to [email protected]

NIUS advertises and makes an impact: The Left is currently doing the best advertising for us | NI...

NIUS advertises and makes an impact: The Left is currently doing the best advertising for us | NI...

Klaas schickt 3-Jährigen ALLEINE zum Bäcker | Experte für Alles

Klaas schickt 3-Jährigen ALLEINE zum Bäcker | Experte für Alles

The Agent Development Lifecycle: Build, Test, Deploy, Monitor | Interrupt 26

The Agent Development Lifecycle: Build, Test, Deploy, Monitor | Interrupt 26

How I Use Aspirin to Unclog Arteries

How I Use Aspirin to Unclog Arteries

Why AI Agents are either the best or worst thing we’ve ever built

Why AI Agents are either the best or worst thing we’ve ever built

The Future of AI Agents: What Will Interrupt 2027 Look Like? | Interrupt 26

The Future of AI Agents: What Will Interrupt 2027 Look Like? | Interrupt 26

$1.5B AI Founder: The Mindset Shift That Separates Winners in 2026

$1.5B AI Founder: The Mindset Shift That Separates Winners in 2026

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Targeting Belly Fat Is... Possible?! (NEW Study)

Targeting Belly Fat Is... Possible?! (NEW Study)

Inside YC's AI Playbook

Inside YC's AI Playbook

Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE - Don Lincoln | Lex Fridman Podcast #497

Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE - Don Lincoln | Lex Fridman Podcast #497

Introducing Managed Deep Agents | Interrupt 26

Introducing Managed Deep Agents | Interrupt 26

How Ghost Shops Triggered China’s Biggest Food Scandal | AB Explained

How Ghost Shops Triggered China’s Biggest Food Scandal | AB Explained

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)

'Listen Like You Might Be Wrong': Harvard Student Goes Viral For Stunning Speech On Trump Amid Feud

'Listen Like You Might Be Wrong': Harvard Student Goes Viral For Stunning Speech On Trump Amid Feud

The most rational take on AI you’ll hear this year

The most rational take on AI you’ll hear this year

How To Build A Self-Improving AI Trading Agent (Insanely Cool)

How To Build A Self-Improving AI Trading Agent (Insanely Cool)

I Investigated India’s Biggest Smartphone Controversy

I Investigated India’s Biggest Smartphone Controversy

The Insane Genius of a Formula 1 Gearbox

The Insane Genius of a Formula 1 Gearbox

9 Habits For Clearer Thinking (I Wish I Knew Sooner)

9 Habits For Clearer Thinking (I Wish I Knew Sooner)