Deploying and Scaling Large Language Models in the Enterprise

Deploying and Scaling Large Language Models in the Enterprise: Architecting Multi-Agent AI Systems Integrating Vision, Data, and Responsible AI Dhanashree Lele, AI Researcher, University of Illinois System Large Language Models (LLMs) are rapidly reshaping enterprise AI, but real-world deployments demand far more than fine-tuning and API calls. They require sophisticated architectures capable of scaling inference, integrating multi-modal data streams, and enforcing responsible AI practices—all under the constraints of enterprise SLAs and cost considerations. In this session, I’ll deliver a deep technical dive into architecting multi-agent AI systems that combine LLMs with computer vision and structured data pipelines. We’ll explore: Multi-Agent System Design: Architectural patterns for decomposing enterprise workflows into specialized LLM-driven agents, including communication protocols, context sharing, and state management. Vision-Language Integration: Engineering methods to fuse embeddings from computer vision models with LLM token streams for tasks such as visual question answering, document understanding, and real-time decision support. Optimization for GPU Inference: Detailed strategies for memory optimization, quantization, mixed-precision computation, and batching to achieve high throughput and low latency in LLM deployment on modern GPU hardware (e.g., NVIDIA A100/H100). Observability and Responsible AI: Techniques for building observability layers into LLM pipelines—capturing token-level traces, detecting drift, logging model confidence—and implementing fairness audits and risk mitigation protocols at runtime. Drawing on practical examples from large-scale enterprise deployments across retail, healthcare, and finance, I’ll discuss the engineering trade-offs, tooling stacks, and lessons learned in translating research-grade LLMs into production-grade systems. This talk is designed for AI engineers and researchers eager to understand the technical complexities—and solutions—behind scaling multi-modal, responsible AI systems that deliver real business value. Speaker Bio: Dhanashree Lele is a Senior Machine Learning Engineer and AI Researcher with over a decade of experience designing and deploying advanced AI systems at scale. Her expertise spans architecting multi-agent solutions that integrate Large Language Models (LLMs), computer vision pipelines, and structured data to solve complex enterprise challenges across industries including retail, healthcare, and finance. At Albertsons, Deloitte, and Fractal, Dhanashree has led the development of production-grade AI applications, focusing on optimization, model observability, and responsible AI practices. Her work includes designing scalable inference architectures for LLMs on modern GPU infrastructures, building hybrid pipelines that fuse vision and language models, and engineering systems that balance performance with ethical and regulatory considerations. She actively collaborates with research institutions like the University of Illinois. Dhanashree actively engages with the research community and frequently speaks on bridging advanced AI research and production systems. / dhanashreelele https://www.meetup.com/sf-bay-acm/eve... https://www.sfbayacm.org/event/deploy... 0:00 Chapter intro 6:28 Speaker intro 7:58 Presentation 9:27 Agenda Foundations of AI Agents 10:31 What is an Agent? 16:35 Agent System Architecture: Core Building Blocks 18:19 When to use AI agents Multi-Agent System Design 26:17 Examples of Real-World Multi-Agent Systems 1. Customer Service & Support Automation 29:39 2. Financial Trading & Market Simulation 31:30 3. Autonomous Vehicles & Smart Traffic Systems 33:48 4. Healthcare & Drug Discovery 37:59 Tools: How Agents Act Beyond Themselves 40:20 Tools Are Functions Mapped to Real-World Actions! 41:53 Tool Schema Template 43:12 SLMs as Agent Brains! 43:58 Multi-Agent AI Systems Architecture 51:06 How the Supervisor Agent Identifies Task Routing 1:00:32 Process: Supervisor + LLM Judge Validation 1:06:56 Agentic Orchestration Layer (e.g., LangChain, crewAI, AutoGen/AG) 1:13:45 Applications Layer (e.g., Workhelix, Meeno, Woebot Health, Kira Learning) 1:17:31 Compute :Usage of Application Layer 1:18:51 Layerwise decomposition of Technical Operation and Compute Usage 1. Semiconductors (NVIDIA, AMD, Intel) 1:18:57 2. Cloud (AWS, Google Cloud, Azure) 1:32:46 3. Foundational Models (OpenAI, Anthropic, Meta) 1:34:02 4. Agentic Orchestration Layer (LanghChain, crewAI, AG) 1:34:10 5. Applications (Workhelix, Meeno, Kira Learning, etc.) 1:34:11 AI Stack Compute Flow 1:34:14 Protocols for Construction of Pipelines 1:37:54 ADK Framework (Agent Development Kit) 1:48"47 Observability Techniques for Multi-Agent System 1:59:16 Putting it All Together 2:01:12 Q&A / Discussion

Full Archon Guide - Build AI Coding Harnesses That Actually Ship (LIVE)

Full Archon Guide - Build AI Coding Harnesses That Actually Ship (LIVE)

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

How to Build an AI Team that Manages Itself

How to Build an AI Team that Manages Itself

Power Automate Beginner to Pro Tutorial [Full Course]

Power Automate Beginner to Pro Tutorial [Full Course]

Don't learn AI Agents without Learning these Fundamentals

Don't learn AI Agents without Learning these Fundamentals

Politics Chat, June 25, 2026

Politics Chat, June 25, 2026

Claude Architect: Multi-Agent Orchestration

Claude Architect: Multi-Agent Orchestration

Enterprise Architecture = Architecting the Enterprise? • Gregor Hohpe • YOW! 2017

Enterprise Architecture = Architecting the Enterprise? • Gregor Hohpe • YOW! 2017

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

Accelerating Startups to Put Enterprise GenAI Multi-Agent Systems in Production

Accelerating Startups to Put Enterprise GenAI Multi-Agent Systems in Production

Historian Timothy Snyder on ENDING Trump Nightmare FOR GOOD | PoliticsGirl

Historian Timothy Snyder on ENDING Trump Nightmare FOR GOOD | PoliticsGirl

Complete Generative AI Course For Free | Gen AI Course 2026 | Intellipaat

Complete Generative AI Course For Free | Gen AI Course 2026 | Intellipaat

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Hermes Architecture EXPLAINED: Memory, Context & Gateways

Hermes Architecture EXPLAINED: Memory, Context & Gateways

NASA’s Artemis II Crew Comes Home (Official Broadcast)

NASA’s Artemis II Crew Comes Home (Official Broadcast)

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

RAG Crash Course for Beginners

RAG Crash Course for Beginners