Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize

Most agents get tested by running a few queries and checking if it looks right. Laurie calls this the vibes problem: it doesn't catch regressions, doesn't run in CI, and doesn't tell you whether a prompt fix broke three other things. This workshop builds a complete eval pipeline from scratch on a financial analysis agent: tracing with Phoenix, reading traces before writing a single eval, categorizing failures by root cause, then building code evals, built-in LLM-as-a-judge evals, and a custom rubric with labeled examples. The sharpest lesson: choosing the right eval matters more than tuning it. A correctness eval scored 0 out of 13 on the same agent that a faithfulness eval scored 13 out of 13, because the model doesn't know what year it is and can't verify forward-looking financial data. The workshop closes on the thing most eval content skips — experiments that let you prove a prompt change actually worked, rather than eyeballing it and calling it a win. Speaker info: https://x.com/seldo   / seldo   https://github.com/seldo

Agent Optimization with Pydantic AI: GEPA, Evals, Feedback Loops — Samuel Colvin, Pydantic
▶︎

Agent Optimization with Pydantic AI: GEPA, Evals, Feedback Loops — Samuel Colvin, Pydantic

Skill Issue: How We Used AI to Make Agents Actually Good at Supabase — Pedro Rodrigues, Supabase
▶︎

Skill Issue: How We Used AI to Make Agents Actually Good at Supabase — Pedro Rodrigues, Supabase

InsightMatches Platform
▶︎

InsightMatches Platform

Full Walkthrough: Writing & Using Skills — Nick Nisi and Zack Proser
▶︎

Full Walkthrough: Writing & Using Skills — Nick Nisi and Zack Proser

Beyond Components: Designing Generative UI for MCP Apps — Ruben Casas, Postman
▶︎

Beyond Components: Designing Generative UI for MCP Apps — Ruben Casas, Postman

Claude Agents Tutorial: Free 2-Hour Masterclass by Anthropic
▶︎

Claude Agents Tutorial: Free 2-Hour Masterclass by Anthropic

Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.
▶︎

Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.

Claude Architect: Multi-Agent Orchestration
▶︎

Claude Architect: Multi-Agent Orchestration

Full Workshop: Build Your Own Deep Research Agents - Louis-François Bouchard, Paul Iusztin, Samridhi
▶︎

Full Workshop: Build Your Own Deep Research Agents - Louis-François Bouchard, Paul Iusztin, Samridhi

Demand-Driven Context: A Methodology for Coherent Knowledge Bases Through Agent Failure
▶︎

Demand-Driven Context: A Methodology for Coherent Knowledge Bases Through Agent Failure

Complete Beginner's Course on AI Evaluations in 50 Minutes (2025) | Aman Khan
▶︎

Complete Beginner's Course on AI Evaluations in 50 Minutes (2025) | Aman Khan

Ralph Loops: Build Dumb AI Loops That Ship — Chris Parsons, Cherrypick
▶︎

Ralph Loops: Build Dumb AI Loops That Ship — Chris Parsons, Cherrypick

JavaScript Tutorial For Beginners | JavaScript Training | JavaScript Course | Intellipaat
▶︎

JavaScript Tutorial For Beginners | JavaScript Training | JavaScript Course | Intellipaat

What is SonarQube | Introduction SonarQube | SonarQube Tutorial | SonarQube Basics | Intellipaat
▶︎

What is SonarQube | Introduction SonarQube | SonarQube Tutorial | SonarQube Basics | Intellipaat

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar
▶︎

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

The Agent Development Lifecycle: Build, Test, Deploy, Monitor | Interrupt 26
▶︎

The Agent Development Lifecycle: Build, Test, Deploy, Monitor | Interrupt 26

Anthropic Workshop: Build Agents That Run for Hours — Ash Prabaker & Andrew Wilson
▶︎

Anthropic Workshop: Build Agents That Run for Hours — Ash Prabaker & Andrew Wilson

Ensure AI Agents Work: Evaluation Frameworks for Scaling Success — Aparna Dhinkaran, CEO Arize
▶︎

Ensure AI Agents Work: Evaluation Frameworks for Scaling Success — Aparna Dhinkaran, CEO Arize

How to Build AI Evals in 2026 (Step-by-Step, No Hype)
▶︎

How to Build AI Evals in 2026 (Step-by-Step, No Hype)

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM
▶︎

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM