How Coinbase Uses Ray, vLLM & LiteLLM to Power Secure LLM Services | Ray Summit 2025
At Ray Summit 2025, Wenyue Liu and Akshit Trehan from Coinbase share how the Coinbase Machine Learning Platform (MLP) team built trusted, production-grade LLM services using Ray, vLLM, and LiteLLM—supporting one of the world’s most security-sensitive environments and reinforcing Coinbase’s mission to remain the most trusted crypto exchange. They begin by outlining the unique challenges of building LLM infrastructure inside a financial institution, where trust, security, and reliability are non-negotiable. To meet these requirements, Coinbase engineered an LLM serving stack that seamlessly integrates: Ray for distributed orchestration and scaling vLLM for high-throughput, low-latency inference LiteLLM for routing, abstraction, and multi-provider reliability The speakers then take a deep dive into the technical architecture behind Coinbase’s internal LLM services, including: User authentication and authorization patterns tailored for secure LLM access Service-to-service (s2s) trust models that allow safe and auditable communication between internal systems LiteLLM distribution strategies to balance throughput, reliability, and fallback behavior How vLLM and Ray work together to power scalable, production-grade LLM serving APIs Systems built to support high-volume internal LLM traffic, ensuring consistent performance under load The session walks through the full end-to-end story of how Coinbase uses Ray and vLLM to deliver trustworthy, secure, and efficient LLM services—meeting the strict reliability requirements of a top global crypto exchange. Liked this video? Check out other Ray Summit breakout session recordings • Ray Summit 2025 - Breakout Sessions Subscribe to our YouTube channel to stay up-to-date on the future of AI! / anyscale 🔗 Connect with us: LinkedIn: / joinanyscale

Webinar: Scaling LLM Fine-Tuning with FSDP, DeepSpeed, and Ray

🔍 AI Serving Frameworks Explained: vLLM vs TensorRT-LLM vs Ray Serve | Which One Should You Use?

Prompt Learning: A Reinforcement Learning-Inspired Approach to AI Optimization | Ray Summit 2025

SRE in the Age of AI: How google runs AI agents in production?

Ray Agent Engine: Deploying AI Agents with Ray Serve | Ray Summit 2025

LMCache Office Hour 2025 01 08

Anthropic's 2026 Report Reveals the TRUTH About JOBS | Warikoo Careers Hindi

The Rise of vLLM: Building an Open Source LLM Inference Engine

Brendan Burns: Lessons from Building Kubernetes and the Future of AI Infrastructure

Windows is a trainwreck

Distributed Model Training with Ray at Capital One | Ray Summit 2025

System Design Course – APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

I Ran a Trillion Parameter AI on a Mac... Here’s the Secret

Andrej Karpathy: Software Is Changing (Again)

Claude just killed ALL Note-Taking Apps. Here is proof.

Why Ray Became a Distributed Computing Engine for Modern AI

How AI agents & Claude skills work (Clearly Explained)

DeepSeek V4 Analysis..

What is Anyscale in 8 min

