Managing AI Costs while Scaling Agentic Workflows with Small Language Models

In this episode of the Transform NOW podcast, host Michael Marchuk hosts Rob May, CEO of Neurometric and co-host of the AI in NYC podcast, discussing practical AI deployment beyond hype. Rob explains Neurometric’s focus on small language models and inference optimization, helping companies move high-volume tasks off frontier models to task-specific endpoints to reduce latency and cost. He explores whether enterprises should stay model-agnostic versus aligning with a vendor for guardrails, culture, and data considerations, and how agentic workflows may soon choose models dynamically, with risks around constraints, availability, and stability. Rob covers model deprecation, data/model drift, and why legacy models may persist via third-party hosting. Key AI KPIs include accuracy, latency, and cost; token-usage metrics are criticized. For regulated workflows like KYC and fraud, he stresses human oversight and alerting. They also discuss governance tradeoffs, startup vs incumbent advantages, quantization and other inference cost tactics, and ongoing chip innovation driving edge AI and potential data center opportunities. -Small Language Models -Model Moats and Vendor Choice -Agents Routing to Models -Failures, Drift, and Legacy Models -Executive KPIs for AI -Bottlenecks in Regulated Workflows -Minimum Viable Governance -Big Firms vs Startups Strategy -Future Hardware and Edge AI Visit us on our socials: 🦾 Get started with SS&C Blue Prism: https://okt.to/JcMLdU 🧑‍💻LinkedIn: https://okt.to/k8zIdp ✖️Twitter: https://okt.to/fHyd9G 🙋‍♀️Facebook: https://okt.to/Vyjfiz 📸Instagram: https://okt.to/5nYvIf 💭Blog: https://okt.to/QuGqVP 🤩Case studies: https://okt.to/ft1AMX To ensure that you never miss an episode of Transform NOW, be sure to subscribe!