From Vibes to Validation: How To Evaluate LLMs and Agents

Learn about evaluating LLMs and agentic systems, with a practical end-to-end framework that shows how to combine qualitative review, structured human evaluation, and benchmarks to measure what matters in production. Large language models and agentic systems are moving quickly from prototypes into production, but knowing how to evaluate them effectively remains one of the biggest challenges teams face. In this recording, we explore the full spectrum of LLM and agent evaluation approaches, from lightweight qualitative reviews and “gut checks” to structured human evaluations and automated benchmarks. Rather than framing these methods as tradeoffs, we’ll show how they work best together across different stages of development. We’ll dig into where human judgment is essential: evaluating usefulness, reasoning quality, safety, and alignment with real user needs. You’ll learn why benchmarks alone often fall short, how to avoid common evaluation pitfalls, and how to incorporate human review at scale without slowing teams down. You’ll walk away with: • A practical framework for evaluating LLMs and agentic systems end to end • Clear guidance on when to use benchmarks vs. human evaluation • Strategies for scaling human review while maintaining rigor and speed • A better understanding of how to measure what actually matters Whether you’re building, deploying, or managing AI systems in production, this video will help you design evaluation pipelines that deliver real insight and confidence.

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.

Agentic Evaluations Workshop - Deep Dive on the Future on Evals for Agents.

How To Think SO CLEARLY People Assume You're A Genius

How To Think SO CLEARLY People Assume You're A Genius

Is RAG Still Needed? Choosing the Best Approach for LLMs

Is RAG Still Needed? Choosing the Best Approach for LLMs

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Build a Complete Medical Chatbot with LLMs, LangChain, Pinecone, Flask & AWS 🔥

Build a Complete Medical Chatbot with LLMs, LangChain, Pinecone, Flask & AWS 🔥

Learning how to learn | Barbara Oakley | TEDxOaklandUniversity

Learning how to learn | Barbara Oakley | TEDxOaklandUniversity

Beyond Inter annotator Agreement: Managing Quality with Consensus

Beyond Inter annotator Agreement: Managing Quality with Consensus

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

ART SCREENSAVER FOR YOUR TV | NO MUSIC | 2Hour | Abstract neutral art

ART SCREENSAVER FOR YOUR TV | NO MUSIC | 2Hour | Abstract neutral art

Don't learn AI Agents without Learning these Fundamentals

Don't learn AI Agents without Learning these Fundamentals

God Says:"TAKE THIS MESSAGE SERIOUSLY, BECAUSE ONLY YOU ARE SEEING IT"/God Message Now/God Message

God Says:"TAKE THIS MESSAGE SERIOUSLY, BECAUSE ONLY YOU ARE SEEING IT"/God Message Now/God Message

Trump Sends Vance to Concede to Iran & Reflecting Pool Is Filled with Corruption | The Daily Show

Trump Sends Vance to Concede to Iran & Reflecting Pool Is Filled with Corruption | The Daily Show

Instant Focus Mode – 40Hz Gamma Brainwave Music for Deep Focus & Productivity

Instant Focus Mode – 40Hz Gamma Brainwave Music for Deep Focus & Productivity

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Full Walkthrough: Workflow for AI Coding — Matt Pocock

What AI Agent Skills Are and How They Work

What AI Agent Skills Are and How They Work

A Brief History of AI: From Machine Learning to Gen AI to Agentic AI

A Brief History of AI: From Machine Learning to Gen AI to Agentic AI

Why Inference is hard..

Why Inference is hard..

Label Studio Wrapped 2025

Label Studio Wrapped 2025