Sapphire A Deep Dive into Presto's Native Velox Integration Shrinidhi Joshi, Meta

Sapphire (Presto-on-spark) is Presto’s C++, scheduled on spark runtime, with Velox execution, designed to deliver significant performance improvements for large-scale analytical workloads at Meta. In this talk, we present Sapphire’s architecture and how it deeply integrates with the Velox execution engine. We dive into three key areas where Sapphire extends and optimizes Velox’s capabilities: shuffle integration, detailing our shuffle operators and performance enhancements through tight coupling with Velox’s memory and serialization layers; broadcast join with hash table caching, showing how reusing pre-built hash tables across tasks eliminates redundant work; and sorted-shuffle and sort-merge join support, extending Velox’s operator model for merge-based join strategies critical at scale. We close with production results demonstrating Sapphire’s gains over Presto’s Java-based execution engine across key workloads at Meta.