How AI Engineers Improve Agentic Products

Anyone can be a math and science person with Brilliant! Visit https://brilliant.org/AdamLucek/ to start learning and save 20% off an annual premium subscription. Resources: Content Discussed - https://lucek.ai/blogs/llm-evaluations Evaluizer - https://github.com/ALucek/evaluizer LLM Evals FAQ - https://hamel.dev/blog/posts/evals-faq/ Short Musings on AI Engineering and "Failed AI Projects" https://www.sh-reya.com/blog/ai-engin... Product Evals in Three Simple Steps - https://eugeneyan.com/writing/product... An LLM-as-Judge Won't Save The Product—Fixing Your Process Will https://eugeneyan.com/writing/eval-pr... A Field Guide to Rapidly Improving AI Products - https://hamel.dev/blog/posts/field-gu... Who Validates the Validator - https://arxiv.org/pdf/2404.12272 Chapters: 00:00 - Why do we need to improve? 05:20 - Brilliant! 07:13 - Context Continued 09:10 - What Are LLM Evals? 12:11 - Human Feedback 13:48 - Creating the Initial Feedback Set 16:15 - Annotation Part 1 19:56 - Performing Error Analysis 26:14 - LLM-As-A-Judge 27:44 - LLM Judge Pitfalls 29:46 - LLM Judge Alignment 33:11 - Function Evaluations 36:10 - Observability Platforms 39:09 - The Benefits 40:21 - Benefit: Algorithmic Optimization 42:37 - Benefit: Reinforcement Learning 44:28 - Future Checklist 47:10 - Is it Worth It? This video is sponsored by Brilliant #ai #coding #datascience