Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

Baseten CEO and co-founder Tuhin Srivastava sits down with Sarah Guo and Elad Gil to discuss the rapid growth of AI inference demand, Baseten’s 30x growth, and why inference is becoming the strategic “last market.” Tuhin Srivastava argues the application layer will persist because companies with unique user signals can encode value into workflows and post-train specialized models, citing examples like Abridge and support workflows. The conversation covers GPU capacity constraints, Baseten’s multi-cloud fabric across 18 clouds and 90 clusters, long-term contracting dynamics, the importance of the software layer for stickiness, evolving workloads, multichip possibilities, and operational lessons at scale. Sign up for new podcasts every week. Email feedback to [email protected] Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Tuhinone Chapters: 00:31 Baseten growth 01:55 Why the app layer wins 05:57 Serving frontier customers 07:55 Open source model mix 09:21 Chinese models and geopolitics 13:07 Custom inference dominates 14:22 Post training acquisition 17:10 When to invest in custom models 18:35 Supply crunch and data centerse 22:25 Longer GPU Contracts 24:09 What Makes a Winner 26:07 Multi Chip Future 28:19 Runtime Roadmap 31:08 Scaling Edge Cases 33:48 Hiring and Leadership 36:44 Operations Pager Culture 38:19 Efficiency Drives Demand 40:41 Concierge Everything Future 42:34 Conclusion

Forward Deployed Engineer Salary: Why Palantir FDEs Make $350K+

Forward Deployed Engineer Salary: Why Palantir FDEs Make $350K+

Inference, Diffusion, World Models, and More | YC Paper Club

Inference, Diffusion, World Models, and More | YC Paper Club

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

SAP: Bringing the ‘Operating System’ of a Company into the AI Era with CTO Philipp Herzig

SAP: Bringing the ‘Operating System’ of a Company into the AI Era with CTO Philipp Herzig

Claude Fable 5 and Mythos 5, Anthropic Just Revealed the Future of Software Engineering

Claude Fable 5 and Mythos 5, Anthropic Just Revealed the Future of Software Engineering

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning

Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning

Good Company Ep. 6 | Dannie Herzberg, President @ Baseten

Good Company Ep. 6 | Dannie Herzberg, President @ Baseten

The field is underestimating inference compute | Noam Brown

The field is underestimating inference compute | Noam Brown

The Future of AI Agents with Andrew Ng | Interrupt 26

The Future of AI Agents with Andrew Ng | Interrupt 26

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

A leader’s guide to advanced team structures in an agentic world | AWS Events

A leader’s guide to advanced team structures in an agentic world | AWS Events

Inside YC's AI Playbook

Inside YC's AI Playbook

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

Headroom: A Context Optimization Layer for LLM Applications - Tejas Chopra, Netflix, Inc.

Headroom: A Context Optimization Layer for LLM Applications - Tejas Chopra, Netflix, Inc.

Re-engineering the Semiconductor Supply Chain with Intel CEO Lip Bu Tan

Re-engineering the Semiconductor Supply Chain with Intel CEO Lip Bu Tan

Stop Prompting Claude. Use Karpathy's Method Instead.

Stop Prompting Claude. Use Karpathy's Method Instead.

Why The Best Software Engineers Are Solving Code Review Bottlenecks Now

Why The Best Software Engineers Are Solving Code Review Bottlenecks Now

Watts, Wafers, and the Future of AI Infra | Gavin Baker

Watts, Wafers, and the Future of AI Infra | Gavin Baker

Anthropic's Boris Cherny: Why Coding Is Solved, and What Comes Next

Anthropic's Boris Cherny: Why Coding Is Solved, and What Comes Next