Groq Head of Evals: How to Actually Make RAG & Agents Fast
If you're building AI agents that take forever to respond, this talk is for you. Aarush Sah, who leads evals at Groq, breaks down practical strategies to dramatically reduce agent latency without sacrificing quality, and not just by switching to faster models. You'll learn how to measure what matters (TTFT, tokens per second, step latency, and end-to-end latency), implement parallelism for multi-step workflows, stream intermediate steps to improve user experience, and reinvest speed gains into better reasoning. Aarush demonstrates these concepts with real examples showing how a 45-second agent workflow can be cut to seconds. if you want to learn more about improving rag applications check out https://improvingrag.com/ TIME STAMPS 00:00 Introduction and Key Takeaway 02:14 Understanding Latency Metrics 05:05 Demo: Compound Beta Mini vs. Perplexity Sonar 08:02 Complex Agent Example: Travel Planning 10:15 Optimization Strategies for Reducing Latency 17:36 Q&A: Real-World Applications and Tools 26:49 Optimizing UI for Faster Responses 28:55 Outcome-Based Pricing Models 30:30 Evaluating Subjective Criteria 37:21 Optimizing Models for Specific Use Cases 41:41 Impact of Fast Inference on Evaluations 44:23 Real-Time Intelligent Co-Pilots 46:35 Latency in Production Systems

The Only 2 Things That Actually Grow Your YouTube Channel in 2026

The German national team travels to the World Cup in grey, without flags: "An aesthetic self-eras...

How AI agents & Claude skills work (Clearly Explained)

CLI vs MCP: How AI Agents Choose the Right Tool for the Job

How I’d Build a $1M Digital Product Business With MERCA AI

How to Get and Evaluate Startup Ideas | Startup School

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

BMNG7322w 2026 Exam Prep

Is RAG Still Needed? Choosing the Best Approach for LLMs

Sprint 13 - SSW AI - Review and Planning Meeting 2026-05-25

How to Succeed in Vertical AI

The Best Local Agentic Coding Workflow (Complete Guide)

Model Context Protocol (MCP), clearly explained (why it matters)

Building AI Agents that actually work (Full Course)

START YOUR TUESDAY WITH FAITH | TODAY GOD IS GIVING YOU UNEXPECTED OPPORTUNITIES | FATHER FREDDY ...

Leading in the Age of AI: A Conversation with NVIDIA CEO Jensen Huang | Global Conference 2026

The Hidden Signal in Production Logs with CEO Scott Clark

🚗 BYD : The biggest SCAM of the car industry ?

From Idea to $650M Exit: Lessons in Building AI Startups

