Apache Arrow: A New Gold Standard for Data Transport - Subsurface Summer 2020 Tutorial

Apache Arrow: A New Gold Standard for Dataset Transport is a talk presented by Wes McKinney, Director at Ursa Labs, at Subsurface Summer 2020. Apache Arrow is a disruptive technology that enables faster and more efficient data transport compared to traditional methods. This talk will discuss the technical details of why the Arrow protocol is an attractive choice and provide specific examples of where it has been employed. Attendees will hear why Arrow is an ideal choice for modern data systems such as Data Lakehouses, Data Warehouses, Data Lakes, and Data Lake Engines. Apache Arrow has emerged as a revolutionary new way to move datasets between systems quickly and efficiently. It solves many of the problems associated with traditional data transport approaches such as incompatibilities between systems, lack of scalability, and high cost of ownership. By leveraging the advantages of Arrow’s columnar format and its Flight end-to-end protocol for secure dataset transmission, organizations can now unlock new opportunities for data sharing that were previously unattainable. In this session, Wes McKinney will discuss the specifics behind Apache Arrow's innovative approach to dataset transport. He'll explain how it enables more efficient use of computing resources by storing datasets in columnar formats that are easy to analyze and query. He'll also demonstrate how the Flight protocol allows multiple nodes to securely share datasets across distributed systems without sacrificing performance or security. Attendees will gain a deeper understanding of how Apache Arrow can help them overcome their data transport challenges and unlock new opportunities for their organization. They'll learn how they can leverage this technology to improve performance while reducing cost and complexity when moving large datasets between systems such as Data Lakehouses, Data Warehouses, Data Lakes, or Data Lake Engines. Don't miss your chance to hear from Wes McKinney about Apache Arrow's potential impact on modern data systems! Connect with us! Twitter: https://bit.ly/30pcpE1 LinkedIn: https://bit.ly/2PoqsDq Facebook: https://bit.ly/2BV881V Community Forum: https://bit.ly/2ELXT0W Github: https://bit.ly/3go4dcM Blog: https://bit.ly/2DgyR9B Questions?: https://bit.ly/30oi8tX Website: https://bit.ly/2XmtEnN

Apache Iceberg & Hive Metastore Integration // Subsurface Summer 2020 | Hiveberg
▶︎

Apache Iceberg & Hive Metastore Integration // Subsurface Summer 2020 | Hiveberg

Apache Arrow: High-Performance Columnar Data Framework (Wes McKinney)
▶︎

Apache Arrow: High-Performance Columnar Data Framework (Wes McKinney)

"Apache Arrow and the Future of Data Frames" with Wes McKinney
▶︎

"Apache Arrow and the Future of Data Frames" with Wes McKinney

Wes McKinney -  Apache Arrow: Leveling Up the Data Science Stack
▶︎

Wes McKinney - Apache Arrow: Leveling Up the Data Science Stack

Introduction to Apache Arrow
▶︎

Introduction to Apache Arrow

The columnar roadmap: Apache Parquet and Apache Arrow
▶︎

The columnar roadmap: Apache Parquet and Apache Arrow

Parquet File Format - Explained to a 5 Year Old!
▶︎

Parquet File Format - Explained to a 5 Year Old!

Distributed Systems in One Lesson by Tim Berglund
▶︎

Distributed Systems in One Lesson by Tim Berglund

Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight and Parquet
▶︎

Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight and Parquet

Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio
▶︎

Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio

Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust by Andrew Lamb
▶︎

Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust by Andrew Lamb

What is a Vector Database? Powering Semantic Search & AI Applications
▶︎

What is a Vector Database? Powering Semantic Search & AI Applications

ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka
▶︎

ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka

Something is jamming GPS over Europe. Here's what we found
▶︎

Something is jamming GPS over Europe. Here's what we found

Apache Arrow Meetup SF: Learn In Theory & In Practice
▶︎

Apache Arrow Meetup SF: Learn In Theory & In Practice

Apache Arrow and Substrait, the secret foundations of Data Engineering — Alessandro Molina
▶︎

Apache Arrow and Substrait, the secret foundations of Data Engineering — Alessandro Molina

Making Apache Spark™ Better with Delta Lake
▶︎

Making Apache Spark™ Better with Delta Lake

The Design of InfluxDB IOx: In-Memory Columnar Database Written in Rust with Apache Arrow (Paul Dix)
▶︎

The Design of InfluxDB IOx: In-Memory Columnar Database Written in Rust with Apache Arrow (Paul Dix)

Apache Arrow Flight SQL: High Performance, Simplicity, and Interoperability for Data Transfers
▶︎

Apache Arrow Flight SQL: High Performance, Simplicity, and Interoperability for Data Transfers

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!
▶︎

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!