Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio
Download slides for this talk: https://goo.gl/eMWk8i Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible. We'll start by talking about in-memory caches and the difference between block-based and data-aware caching strategies. We'll discuss the deployment design of this type of solution as well as cover the strengths of each. There will also be a discussion of the relationship of security and predicate application in these scenarios. Then we'll go into detail about how columnar storage formats can further enhance performance by minimizing read time, optimizing for vectorized in-memory processing and powerful compression techniques. Lastly, we'll introduce a much more advanced way to speed access to data called relational caching. Relational caching builds a cache on columnar in-memory caching techniques but also includes a full comprehension of how data is being used and how different forms of data relate to each other. This will include leveraging multiple sorting and partitioning strategies as well as maintaining multiple related derivations of data for different types of access patterns. As part of this and we also cover approaches to data ttl, relational cache consistency and several different approaches to data mutation and real-time updates. ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups. FOLLOW DATA COUNCIL: Twitter: / datacouncilai LinkedIn: / datacouncil-ai Facebook: / datacouncilai Eventbrite: https://www.eventbrite.com/o/data-cou... - 🎟️ GET YOUR TICKET TO AI COUNCIL 2026 🎟️ Meet the world's top AI infrastructure minds where architects of AI share what works. Three days of high-quality technical talks and meaningful interactions. → https://aicouncil.com/sf-2026 ⚡ FIND US: X: https://x.com/AICouncilConf LinkedIn: / aicouncilconf Website: https://aicouncil.com/

The columnar roadmap: Apache Parquet and Apache Arrow

Apache Arrow: High-Performance Columnar Data Framework (Wes McKinney)

The Challenges of Distributing Postgres: A Citus Story | Citus Data

Parquet File Format - Explained to a 5 Year Old!

Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight and Parquet

What Is Apache Arrow? Explained by Matt Topol | Dremio

Rearchitecting a SQL Database for Time-Series Data | TimescaleDB

Cloud Data Warehouse Benchmark Redshift vs Snowflake vs BigQuery | Fivetran

Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (Andrew Lamb)

Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Apache Iceberg - A Table Format for Huge Analytic Datasets

DuckDB, Apache Arrow, & the Future of Data Engineering w/ Rusty Conover | S2E3

A Conversation with Demis Hassabis, Co-Founder and CEO of Google DeepMind

Apache Arrow Meetup SF: Learn In Theory & In Practice

Something is jamming GPS over Europe. Here's what we found

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

#ACEU19: Chris Baynes – Fast federated SQL with Apache Calcite

Building the PERFECT Linux PC with Linus Torvalds

