Principles of Evals: The Future of GenAI Evaluation (E.43)

LLMs are optimized to sound convincing—not to know when they’re wrong. In this episode, Deanna Emery breaks down why hallucinations are fundamentally tied to how language models work, why confidence is often disconnected from correctness, and how better evaluation strategies can make AI systems more reliable in production. We also get into uncertainty, semantic reasoning, and what humans still do better than models. 00:00 — Why LLMs hallucinate confidently 09:00 — The limits of current eval systems 18:00 — Why uncertainty matters in AI 27:00 — Semantic reasoning vs memorization 38:00 — What humans still do better than models The biggest risk in AI isn’t wrong answers. It’s wrong answers delivered with confidence.

Diverse Hiring for AI Skills (E.27)

Diverse Hiring for AI Skills (E.27)

How to Prevent Doomsday: Guardrails, Alignment, and Education (E.40)

How to Prevent Doomsday: Guardrails, Alignment, and Education (E.40)

The Riskiest Moment of the AI Bubble

The Riskiest Moment of the AI Bubble

Trump Preps for 80th Birthday, Threatens to Hit Iran, Knicks Historic Win & Elon Musk Trillionaire!?

Trump Preps for 80th Birthday, Threatens to Hit Iran, Knicks Historic Win & Elon Musk Trillionaire!?

Data Planet | Navigating SEO Challenges: Boutique Strategies with Dave Estey

Data Planet | Navigating SEO Challenges: Boutique Strategies with Dave Estey

Is the AI Boom About to COLLAPSE?

Is the AI Boom About to COLLAPSE?

LIVE: Conan O’Brien speaks at Harvard graduation ceremony (full)

LIVE: Conan O’Brien speaks at Harvard graduation ceremony (full)

How AI Cracked the Protein Folding Code and Won a Nobel Prize

How AI Cracked the Protein Folding Code and Won a Nobel Prize

Training Sand to Think: Artificial General Intelligence & Future of Physics

Training Sand to Think: Artificial General Intelligence & Future of Physics

Data Poisoning - The Hidden Risk Shaping AI

Data Poisoning - The Hidden Risk Shaping AI

From Idea to $650M Exit: Lessons in Building AI Startups

From Idea to $650M Exit: Lessons in Building AI Startups

Something is jamming GPS over Europe. Here's what we found

Something is jamming GPS over Europe. Here's what we found

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

The strategic value of internal development teams

The strategic value of internal development teams

Data Planet | Navigating Data Strategy: Insights and Innovations with John Wessell

Data Planet | Navigating Data Strategy: Insights and Innovations with John Wessell

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

"A.I. and Our Economic Future," Professor Chad Jones

"A.I. and Our Economic Future," Professor Chad Jones

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]