Make Legal Write Your Evals: Building Jade, Chime’s Financial Copilot | Interrupt 26

Philipp Comans, software engineer at Chime, explains how his team built Jade — Chime's always-on financial co-pilot — and solved one of the hardest problems in production AI: getting non-engineers to meaningfully participate in evals. By structuring compliance risk as a taxonomy, using Giskard to generate adversarial test cases, and closing the loop with LangSmith, Chime turned their legal and compliance partners into active co-authors of their eval pipeline. The result: faster releases, fewer surprises at the release gate, and compliance trust built continuously rather than at the end. Chapters: 0:00 Intro and what Chime's Jade actually does 1:04 Why "oops-driven development" breaks trust (and invites regulators) 1:51 The core question: how do you know your agent is compliant? 2:09 The problem with the traditional compliance model 2:52 What the better model looks like 3:13 Evals as your alignment surface 3:55 The language barrier between engineers and legal 4:19 Five steps to bridge the gap 4:36 What evals actually are: LLM-as-a-judge, explained simply 5:32 Step 1: Creating structure with a risk taxonomy 6:55 Handing the taxonomy back to legal to define in their own words 7:52 From risk definition to dataset: how Giskard generates adversarial questions 9:25 "I can't give you investment advice, but Nvidia has been on a tear" 9:48 Step 2: Building the LLM-as-a-judge evaluator from the same risk doc 10:35 Running evals in LangSmith and reading the results 11:06 Making scores visible at every level: engineers, compliance, execs 11:38 Step 3: The feedback flywheel — four ways one annotation improves the system 13:07 What this bought Chime: velocity, alignment, and trust 13:44 Five takeaways Extra resources: • Everything we shipped at Interrupt: https://www.langchain.com/blog/interr... • Meet LangSmith Engine: https://www.langchain.com/blog/introd... • About LangChain: https://www.langchain.com/