Inside Apache Druid’s storage and query engine

Apache Druid is an open-source columnar database known for high performance at scale; its largest deployments comprise thousands of servers. But no matter the scale, high performance starts with good fundamentals. This talk will dive into those fundamentals by exploring the inner workings of a single data server. We’ll cover how Apache Druid stores data, what kinds of compression it uses, how it indexes data, how the storage engine is linked with the query processing engine, and how the system handles resource management and multithreading. Together, all these pieces enable Apache Druid to process billions of records per second on a single data server. Imply is a real-time data platform for cost-effective, low-latency analytics. Uniquely, it provides consistent sub-second response to ad hoc queries against PB-scale data, even with high user concurrency. Imply is used for clickstream analytics, application, network and service performance monitoring, IoT analytics, fraud detection and more. Imply powers user-facing analytics applications and serves as a backend for highly-concurrent APIs. Companies such as Twitter, Charter (Spectrum), Twitch and DBS (Southeast Asia’s largest bank) trust Imply to put analytics into the hands of their trained analysts and non-technical business people. Connect Website: https://imply.io/ Linkedin: / impl... Twitter: / implydata Github: https://github.com/implydata Slideshare: https://www.slideshare.net/implydata

Apache Druid Explained | Core Concepts

Apache Druid Explained | Core Concepts

Intro to Apache Pinot

Intro to Apache Pinot

Apache Arrow: High-Performance Columnar Data Framework (Wes McKinney)

Apache Arrow: High-Performance Columnar Data Framework (Wes McKinney)

A truly technical introduction to Apache Druid

A truly technical introduction to Apache Druid

Using Apache Kafka and Apache Pinot for User-Facing, Real-Time Analytics

Using Apache Kafka and Apache Pinot for User-Facing, Real-Time Analytics

Elasticsearch Under the Hood - Philipp Krenn - NDC Copenhagen 2022

Elasticsearch Under the Hood - Philipp Krenn - NDC Copenhagen 2022

Tuning Druid Clusters at Scale | ironSource, Lyft, Imply

Tuning Druid Clusters at Scale | ironSource, Lyft, Imply

Building a Real-Time Analytics Stack with Apache Kafka and Apache Druid

Building a Real-Time Analytics Stack with Apache Kafka and Apache Druid

Complete Terraform Course - From BEGINNER to PRO! (Learn Infrastructure as Code)

Complete Terraform Course - From BEGINNER to PRO! (Learn Infrastructure as Code)

Apache Druid 101

Apache Druid 101

"Druid: Powering Interactive Data Applications at Scale" by Fangjin Yang

"Druid: Powering Interactive Data Applications at Scale" by Fangjin Yang

Building Real-Time Analytics Applications Using Apache Pinot

Building Real-Time Analytics Applications Using Apache Pinot

25 - Snowflake Database Architecture Overview (CMU Intro to Database Systems / Fall 2022)

25 - Snowflake Database Architecture Overview (CMU Intro to Database Systems / Fall 2022)

Presto: Fast SQL-on-Anything | Starburst

Presto: Fast SQL-on-Anything | Starburst

Apache Spark Core – Practical Optimization Daniel Tomes (Databricks)

Apache Spark Core – Practical Optimization Daniel Tomes (Databricks)

Demonstrating Apache Druid Rollup

Demonstrating Apache Druid Rollup

Scaling Uber's Metric System from Elasticsearch to Pinot | Uber

Scaling Uber's Metric System from Elasticsearch to Pinot | Uber

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

The columnar roadmap: Apache Parquet and Apache Arrow

The columnar roadmap: Apache Parquet and Apache Arrow