Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI

Users notice latency above 500ms and hang up above one second. In an already optimized pipeline, 75ms of network latency from models sitting in a different data center adds 30% overhead. Colocating everything in the same building drops that to around 5ms. Rishabh Bhargava from Together AI walks through the full speech to text, LLM, and text to speech pipeline at that level of specificity. The LLM dominates the budget: 200 to 300ms time to first token target, 8 to 30B parameter range — larger models blow the latency budget, smaller ones break tool calling. Speech to text target is P90 under 100ms with around 6% word error rate. One pattern for handling complex workflows without adding latency: a small thinker LLM handles conversation flow and issues a single tool call to a larger model when the request is complex, keeping the fast path fast. Speaker info: / bhargavarishabh

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

Agentic Engineering: Working With AI, Not Just Using It — Brendan O'Leary

Agentic Engineering: Working With AI, Not Just Using It — Brendan O'Leary

Making Your AI Reliable: Agentic Grounding and the Context Layer with Kurt Cagle

Making Your AI Reliable: Agentic Grounding and the Context Layer with Kurt Cagle

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Building and evaluating AI Agents — Sayash Kapoor, AI Snake Oil

Building and evaluating AI Agents — Sayash Kapoor, AI Snake Oil

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

The Future of AI Agents with Andrew Ng | Interrupt 26

The Future of AI Agents with Andrew Ng | Interrupt 26

Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carey, Cloudflare

Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carey, Cloudflare

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

The best AI agents are simpler than you think

The best AI agents are simpler than you think

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

What AI Agent Skills Are and How They Work

What AI Agent Skills Are and How They Work

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)

Scott and Mark learn...how agents reshape software engineering | BRK247

Scott and Mark learn...how agents reshape software engineering | BRK247

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Your Attention Is the Bottleneck, Not Your Agents — Zack Proser, WorkOS

Your Attention Is the Bottleneck, Not Your Agents — Zack Proser, WorkOS

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI