Building voice agents with OpenAI — Dominik Kundel, OpenAI
We'll walk through the differences between chained and speech-to-speech powered voice agents, how to approach them, best practices and transform a text-based agent into our first voice-enabled agent About Dominik Kundel Dominik is a developer and product leader with a passion for Developer Experience and Generative AI. He's currently working on Developer Experience & SDKs at OpenAI. Previously he lead Product & Design for Twilio's Emerging Tech & Innovation organization where his team worked on customer-aware AI agents. Dominik loves tinkering with anything that can run JavaScript, from front-end servers to CLIs and coffee machines. You can find him tweeting @dkundel and in his spare time he's working on cocktails, food and photography. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter Timestamps [0:00:00] : Part 1: High-Level Summary and Timestamps The video is a presentation by Dominic from OpenAI on building voice agents. The main thesis is that voice agents are the future of accessible and information-dense technology, acting as an API to the real world. The presentation introduces the new OpenAI Agents SDK for TypeScript and dives deep into the architectures, best practices, and hands-on-building of these voice agents. [00:16] Introduction to voice agents. [01:28] Overview of the OpenAI Agents SDK for TypeScript. [03:27] The case for why voice agents are important. [04:21] A look at different architectures for voice agents. [01:00:16] Best practices for building voice agents. [01:17:31] A hands-on guide to building a voice agent. [38:07] Q&A session with the audience. Part 2: Detailed Technical Summary Introduction to Voice Agents: Dominic defines voice agents as systems that can accomplish tasks independently for users, which are composed of a model, instructions, access to tools, and a runtime [01:04]. He emphasizes that these agents are designed to be autonomous and helpful. OpenAI Agents SDK for TypeScript: A new TypeScript SDK has been launched, mirroring the Python SDK, to provide a structured way to build agents based on OpenAI's best practices [01:35]. The SDK includes features like handoffs, guardrails, streaming I/O, tool support, built-in tracing, human-in-the-loop support with resumability, and native voice agent support [01:58]. Why Voice Agents?: Voice agents make technology more accessible [03:34] and are more information-dense due to the nuances of tone and voice [03:47]. A key advantage is their ability to act as an API to the real world, for instance, by calling a business that lacks a formal API [04:02]. Voice Agent Architectures: Chained Approach (Text-based): This architecture follows a speech-to-text to text-based agent to text-to-speech pipeline [04:34]. While easier to start with and offering more control, it suffers from challenges like turn detection, increased latency, and a loss of audio context [05:33]. Speech-to-Speech Approach: Here, the model is trained directly on audio for a more seamless conversational experience and tool usage [06:30]. This approach boasts lower latency and a more contextual understanding of tone and voice, leading to a more natural flow [06:47]. However, it is harder to integrate with existing text-based systems and struggles with complex decision-making [07:01]. Delegation Approach: This hybrid approach uses a front-line agent for user interaction which then delegates complex tasks to more powerful reasoning models like GPT-4 mini or GPT-3 via tool calls [07:45]. A demo shows the agent effectively handling interruptions and delegating tasks such as checking the weather or processing refunds [08:00]. Best Practices for Building: Start Small: Begin with a small and clear goal to make performance measurement and iteration more manageable [01:12:40]. Early Evaluations: Implement evaluations and guardrails early in the development process to ensure reliability and manage complexity as the agent grows [01:13:33]. Generative Tone: Leverage generative models to create a specific tone and personality for your agent by prompting for emotions and roles, for example using openai.fm [01:14:14]. Descriptive Flows: Use JSON structures to guide the model through conversational flows, much like a human agent's script, to improve the processing of steps [01:16:46]. Hands-on Building: The presentation includes a live coding session where Dominic builds a voice agent from scratch [01:17:31]. He demonstrates setting up the agent, adding tools, and connecting it to a real-time browser session using Next.js and WebRTC. The demo showcases real-time interaction, interruption handling, conversation transcripts, debugging with the traces dashboard, human-in-the-loop tool execution approval, and agent handoffs for specialized tasks [01:30:29, 01:50:00].
![Building Voice AI Agents That Don’t Suck [Kwindla Kramer] - 739](https://i.ytimg.com/vi/bKvfCJt0U3s/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLD9nAAhLxKitNeUp152aXfrpVApdQ)
Building Voice AI Agents That Don’t Suck [Kwindla Kramer] - 739

JavaScript Tutorial For Beginners | JavaScript Training | JavaScript Course | Intellipaat

When Technology Meets Reality: A Fire Chief's Playbook

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

RFT, DPO, SFT: Fine-tuning with OpenAI — Ilan Bigio, OpenAI

Data & AI Meetup: Intelligent by Design, Secure by Default

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

Ilya Sutskever – We're moving from the age of scaling to the age of research

Build a Complete Medical Chatbot with LLMs, LangChain, Pinecone, Flask & AWS 🔥

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Forget RAG Pipelines—Build Production Ready Agents in 15 Mins: Nina Lopatina, Rajiv Shah, Contextual

Building Production RAG Over Complex Documents

Andrej Karpathy: Software Is Changing (Again)

AI Agents Full Course 2026: Master Agentic AI (2 Hours)

Anthropic Workshop: Build Agents That Run for Hours — Ash Prabaker & Andrew Wilson

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Don't learn AI Agents without Learning these Fundamentals

Full Workshop: Realtime Voice AI — Mark Backman, Daily

