Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI

Users notice latency above 500ms and hang up above one second. In an already optimized pipeline, 75ms of network latency from models sitting in a different data center adds 30% overhead. Colocating everything in the same building drops that to around 5ms. Rishabh Bhargava from Together AI walks through the full speech to text, LLM, and text to speech pipeline at that level of specificity. The LLM dominates the budget: 200 to 300ms time to first token target, 8 to 30B parameter range — larger models blow the latency budget, smaller ones break tool calling. Speech to text target is P90 under 100ms with around 6% word error rate. One pattern for handling complex workflows without adding latency: a small thinker LLM handles conversation flow and issues a single tool call to a larger model when the request is complex, keeping the fast path fast. Speaker info:   / bhargavarishabh  

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS
▶︎

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

Agentic Engineering: Working With AI, Not Just Using It — Brendan O'Leary
▶︎

Agentic Engineering: Working With AI, Not Just Using It — Brendan O'Leary

Making Your AI Reliable: Agentic Grounding and the Context Layer with Kurt Cagle
▶︎

Making Your AI Reliable: Agentic Grounding and the Context Layer with Kurt Cagle

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan
▶︎

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Building and evaluating AI Agents — Sayash Kapoor, AI Snake Oil
▶︎

Building and evaluating AI Agents — Sayash Kapoor, AI Snake Oil

Using Large Language Models | Build Your Own LLM Workshop #1
▶︎

Using Large Language Models | Build Your Own LLM Workshop #1

The Future of AI Agents with Andrew Ng | Interrupt 26
▶︎

The Future of AI Agents with Andrew Ng | Interrupt 26

Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carey, Cloudflare
▶︎

Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carey, Cloudflare

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026
▶︎

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!
▶︎

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

The best AI agents are simpler than you think
▶︎

The best AI agents are simpler than you think

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains
▶︎

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM
▶︎

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

What AI Agent Skills Are and How They Work
▶︎

What AI Agent Skills Are and How They Work

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!
▶︎

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

How AI agents & Claude skills work (Clearly Explained)
▶︎

How AI agents & Claude skills work (Clearly Explained)

Scott and Mark learn...how agents reshape software engineering | BRK247
▶︎

Scott and Mark learn...how agents reshape software engineering | BRK247

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize
▶︎

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Your Attention Is the Bottleneck, Not Your Agents — Zack Proser, WorkOS
▶︎

Your Attention Is the Bottleneck, Not Your Agents — Zack Proser, WorkOS

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI
▶︎

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI