Making Agent Evals Isn’t As Hard As You Think!
Discussing the theory behind creating and using agent evals Resources: Evals Field Guide - https://lucek.ai/blogs/agent-evaluations Evaluation Concepts - https://docs.langchain.com/langsmith/... Demystifying Evals - https://www.anthropic.com/engineering... Chapters: 00:00 - Introduction 00:33 - Context 02:37 - What get’s measured 05:08 - How its measured 08:20 - Unit Test Evals 11:14 - Agent Integration Evals 14:49 - Online Evals 18:32 - Benchmark Evals 23:51 - Agent Eval Loop #ai #programming #datascience

▶︎
From Retrieval to Navigation: The New RAG Paradigm

▶︎
I Trained an LLM to Think Deeper (Here's How)

▶︎
Why The Best Software Engineers Are Solving Code Review Bottlenecks Now

▶︎
Reinventing Entropy | Compression is Intelligence Part 1

▶︎
How Agents Quietly Break Architecture

▶︎
Do Reranking Models Actually Improve RAG?

▶︎
How AI Engineers Improve Agentic Products

▶︎
The Most Famous AI Company Isn't Winning. Here's Who Is.

▶︎
Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

▶︎
What is happening at Meta?

▶︎
Everything you need to know about Loops

▶︎
Full Walkthrough: Workflow for AI Coding — Matt Pocock

▶︎
How I Made An AI EMPLOYEE with Deep Agents

▶︎
The most rational take on AI you’ll hear this year

▶︎
How AI agents & Claude skills work (Clearly Explained)

▶︎
How To Think SO CLEARLY People Assume You're A Genius

▶︎
WTF Is an "AI Agent Loop"? Genius or Hype?

▶︎
Finally. Agent Loops Clearly Explained.

▶︎
the true reason C++ always wins

▶︎
