Make Legal Write Your Evals: Building Jade, Chime’s Financial Copilot | Interrupt 26

Philipp Comans, software engineer at Chime, explains how his team built Jade — Chime's always-on financial co-pilot — and solved one of the hardest problems in production AI: getting non-engineers to meaningfully participate in evals. By structuring compliance risk as a taxonomy, using Giskard to generate adversarial test cases, and closing the loop with LangSmith, Chime turned their legal and compliance partners into active co-authors of their eval pipeline. The result: faster releases, fewer surprises at the release gate, and compliance trust built continuously rather than at the end. Chapters: 0:00 Intro and what Chime's Jade actually does 1:04 Why "oops-driven development" breaks trust (and invites regulators) 1:51 The core question: how do you know your agent is compliant? 2:09 The problem with the traditional compliance model 2:52 What the better model looks like 3:13 Evals as your alignment surface 3:55 The language barrier between engineers and legal 4:19 Five steps to bridge the gap 4:36 What evals actually are: LLM-as-a-judge, explained simply 5:32 Step 1: Creating structure with a risk taxonomy 6:55 Handing the taxonomy back to legal to define in their own words 7:52 From risk definition to dataset: how Giskard generates adversarial questions 9:25 "I can't give you investment advice, but Nvidia has been on a tear" 9:48 Step 2: Building the LLM-as-a-judge evaluator from the same risk doc 10:35 Running evals in LangSmith and reading the results 11:06 Making scores visible at every level: engineers, compliance, execs 11:38 Step 3: The feedback flywheel — four ways one annotation improves the system 13:07 What this bought Chime: velocity, alignment, and trust 13:44 Five takeaways Extra resources: • Everything we shipped at Interrupt: https://www.langchain.com/blog/interr... • Meet LangSmith Engine: https://www.langchain.com/blog/introd... • About LangChain: https://www.langchain.com/

Getting Evals Right for LLM Applications | Interrupt 26

Getting Evals Right for LLM Applications | Interrupt 26

How Lyft Builds Evals That Actually Matter in Production | Interrupt 26

How Lyft Builds Evals That Actually Matter in Production | Interrupt 26

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)

Scott and Mark learn...how agents reshape software engineering | BRK247

Scott and Mark learn...how agents reshape software engineering | BRK247

LangChain Keynote 2026 in 22 Minutes: The Future of Agentic AI

LangChain Keynote 2026 in 22 Minutes: The Future of Agentic AI

MIT Just Revealed the AI Bubble's Fatal Flaw

MIT Just Revealed the AI Bubble's Fatal Flaw

The AI Skills Nobody is Teaching (And Everyone Needs) | AI Expert Ethan Mollick

The AI Skills Nobody is Teaching (And Everyone Needs) | AI Expert Ethan Mollick

The Future of AI Agents with Andrew Ng | Interrupt 26

The Future of AI Agents with Andrew Ng | Interrupt 26

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

The best AI agents are simpler than you think

The best AI agents are simpler than you think

The most rational take on AI you’ll hear this year

The most rational take on AI you’ll hear this year

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Your Agent Keeps Making the Same Mistakes. Here's How to Fix It

Your Agent Keeps Making the Same Mistakes. Here's How to Fix It

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

AI Agents Full Course 2026: Master Agentic AI (2 Hours)

AI Agents Full Course 2026: Master Agentic AI (2 Hours)

Make your Own Agents in Copilot | Complete Tutorial

Make your Own Agents in Copilot | Complete Tutorial

£85K Burned on a Failed PoC: What Actually Gets Agents to Production — Sandipan Bhaumik, Databricks

£85K Burned on a Failed PoC: What Actually Gets Agents to Production — Sandipan Bhaumik, Databricks

The woman behind Claude Code and Cowork on why coding is solved (and comes next) | Fiona Fung

The woman behind Claude Code and Cowork on why coding is solved (and comes next) | Fiona Fung

Observing And Testing CX Agents | Interrupt 26

Observing And Testing CX Agents | Interrupt 26

The Etsy Gifting Assistant: From Prototype to Production | Interrupt 26

The Etsy Gifting Assistant: From Prototype to Production | Interrupt 26