Make Legal Write Your Evals: Building Jade, Chime’s Financial Copilot | Interrupt 26
Philipp Comans, software engineer at Chime, explains how his team built Jade — Chime's always-on financial co-pilot — and solved one of the hardest problems in production AI: getting non-engineers to meaningfully participate in evals. By structuring compliance risk as a taxonomy, using Giskard to generate adversarial test cases, and closing the loop with LangSmith, Chime turned their legal and compliance partners into active co-authors of their eval pipeline. The result: faster releases, fewer surprises at the release gate, and compliance trust built continuously rather than at the end. Chapters: 0:00 Intro and what Chime's Jade actually does 1:04 Why "oops-driven development" breaks trust (and invites regulators) 1:51 The core question: how do you know your agent is compliant? 2:09 The problem with the traditional compliance model 2:52 What the better model looks like 3:13 Evals as your alignment surface 3:55 The language barrier between engineers and legal 4:19 Five steps to bridge the gap 4:36 What evals actually are: LLM-as-a-judge, explained simply 5:32 Step 1: Creating structure with a risk taxonomy 6:55 Handing the taxonomy back to legal to define in their own words 7:52 From risk definition to dataset: how Giskard generates adversarial questions 9:25 "I can't give you investment advice, but Nvidia has been on a tear" 9:48 Step 2: Building the LLM-as-a-judge evaluator from the same risk doc 10:35 Running evals in LangSmith and reading the results 11:06 Making scores visible at every level: engineers, compliance, execs 11:38 Step 3: The feedback flywheel — four ways one annotation improves the system 13:07 What this bought Chime: velocity, alignment, and trust 13:44 Five takeaways Extra resources: • Everything we shipped at Interrupt: https://www.langchain.com/blog/interr... • Meet LangSmith Engine: https://www.langchain.com/blog/introd... • About LangChain: https://www.langchain.com/

Getting Evals Right for LLM Applications | Interrupt 26

How Lyft Builds Evals That Actually Matter in Production | Interrupt 26

How AI agents & Claude skills work (Clearly Explained)

Scott and Mark learn...how agents reshape software engineering | BRK247

LangChain Keynote 2026 in 22 Minutes: The Future of Agentic AI

MIT Just Revealed the AI Bubble's Fatal Flaw

The AI Skills Nobody is Teaching (And Everyone Needs) | AI Expert Ethan Mollick

The Future of AI Agents with Andrew Ng | Interrupt 26

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

The best AI agents are simpler than you think

The most rational take on AI you’ll hear this year

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Your Agent Keeps Making the Same Mistakes. Here's How to Fix It

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

AI Agents Full Course 2026: Master Agentic AI (2 Hours)

Make your Own Agents in Copilot | Complete Tutorial

£85K Burned on a Failed PoC: What Actually Gets Agents to Production — Sandipan Bhaumik, Databricks

The woman behind Claude Code and Cowork on why coding is solved (and comes next) | Fiona Fung

Observing And Testing CX Agents | Interrupt 26

