Prattyush Mangal - Production-Ready AI Agents: From LLMs to Small Language Models | Pydata London 26

Prattyush Mangal - Production-Ready AI Agents: From LLMs to Small Language Models Building a demo agent with hundred billion parameters and beyond can be easy. Deploying reliable, cost-effective agents in production is hard. This talk provides a comprehensive roadmap for taking AI agents from prototype to production, with a focus on migrating from expensive frontier LLMs to efficient small language models (SLMs). We'll explore the entire lifecycle of production agent development: test-driven development practices adapted for non-deterministic AI systems, agent architectures and migration strategies from large to small models, CI/CD considerations for agents, and observability frameworks which capture what matters and assist in remediating failures. Whether you're running agents at scale or planning your first deployment, you'll leave with actionable strategies and concrete tools to build reliable, maintainable agent systems with small language models. In this talk we will cover the complete Agent Development Lifecycle from Prototype to a scalable and robust Production agent with cost effective Small Language Models. The talk will present the following topics, gathered from real engagements with product teams: The Production Agent Problem (3 min) The prototype-to-production gap, why closed, frontier LLMs don't scale, and the agent development lifecycle. Small Models, Big Impact (2 min) The case for small open language models, the current model landscape and pursuing an iterative migration pattern. Test-Driven Agent Development (5 min) Starting with clear use cases and adapting testing practices for non-deterministic systems. Covering evaluation patterns and practical examples of testing agent behavior for different types of agents. Techniques for migrating to Small Language Models (7 min) Introducing task decomposition patterns, use of multi-model approaches and agent architectures better suited to Small Language Model utilisation. CI/CD for Agents (7 min) Treating models and prompts as config rather than code. Building deployment pipelines that handle model and prompt versioning, integration and end-to-end testing for agents with MCP and A2A considerations, and agent packaging for production rollout. Observability and Monitoring (4 min) Instrumenting agents with structured logging, tracking key metrics beyond traditional monitoring, and building dashboards and alerts that surface quality issues. Monitoring non-functional metrics such as cost, latency and concurrency. Continuous Improvement Loops (4 min) Creating feedback pipelines from production data, triaging failures and automating analysis. Strategies for iterative improvement, and methods for measuring progress through A/B testing. As part of this talk, we will reference some Jupyter Notebooks and reusable code snippets with the PyData stack to enable attendees to begin their own Agentic journeys to production with Small Language Models. See also: Useful Code Snippets and Blogs on working with SLMs for Agentic Applications www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...