Making Agent Evals Isn’t As Hard As You Think!

Discussing the theory behind creating and using agent evals Resources: Evals Field Guide - https://lucek.ai/blogs/agent-evaluations Evaluation Concepts - https://docs.langchain.com/langsmith/... Demystifying Evals - https://www.anthropic.com/engineering... Chapters: 00:00 - Introduction 00:33 - Context 02:37 - What get’s measured 05:08 - How its measured 08:20 - Unit Test Evals 11:14 - Agent Integration Evals 14:49 - Online Evals 18:32 - Benchmark Evals 23:51 - Agent Eval Loop #ai #programming #datascience

From Retrieval to Navigation: The New RAG Paradigm

From Retrieval to Navigation: The New RAG Paradigm

I Trained an LLM to Think Deeper (Here's How)

I Trained an LLM to Think Deeper (Here's How)

Why The Best Software Engineers Are Solving Code Review Bottlenecks Now

Why The Best Software Engineers Are Solving Code Review Bottlenecks Now

Reinventing Entropy | Compression is Intelligence Part 1

Reinventing Entropy | Compression is Intelligence Part 1

How Agents Quietly Break Architecture

How Agents Quietly Break Architecture

Do Reranking Models Actually Improve RAG?

Do Reranking Models Actually Improve RAG?

How AI Engineers Improve Agentic Products

How AI Engineers Improve Agentic Products

The Most Famous AI Company Isn't Winning. Here's Who Is.

The Most Famous AI Company Isn't Winning. Here's Who Is.

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

What is happening at Meta?

What is happening at Meta?

Everything you need to know about Loops

Everything you need to know about Loops

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Full Walkthrough: Workflow for AI Coding — Matt Pocock

How I Made An AI EMPLOYEE with Deep Agents

How I Made An AI EMPLOYEE with Deep Agents

The most rational take on AI you’ll hear this year

The most rational take on AI you’ll hear this year

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)

How To Think SO CLEARLY People Assume You're A Genius

How To Think SO CLEARLY People Assume You're A Genius

WTF Is an "AI Agent Loop"? Genius or Hype?

WTF Is an "AI Agent Loop"? Genius or Hype?

Finally. Agent Loops Clearly Explained.

Finally. Agent Loops Clearly Explained.

the true reason C++ always wins

the true reason C++ always wins

Stop Prompting Claude. Use Karpathy's Method Instead.

Stop Prompting Claude. Use Karpathy's Method Instead.