Extending Pandas using Apache Arrow and Numba - Uwe L Korn

PyData Berlin 2018 With the latest release of Pandas the ability to extend it with custom dtypes was introduced. Using Apache Arrow as the in-memory storage and Numba for fast, vectorized computations on these memory regions, it is possible to extend Pandas in pure Python while achieving the same performance of the built-in types. In the talk we implement a native string type as an example. Slides: https://pydata.org/berlin2018/proposa... --- www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 0:00 - Introduction 0:41 - Speaker's background 1:53 - Introduction to Pandas and NumPy 3:05 - Shortcomings of Pandas 8:55 - Extending Pandas with ExtensionArrays 12:16 - Apache Arrow for Data Storage 15:50 - Numba for computing 21:03 - Putting all together 28:10 - Closing 28:45 - Questions S/o to https://github.com/keckelt for the video timestamps! Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Demystifying pandas internals - Marc Garcia
▶︎

Demystifying pandas internals - Marc Garcia

Effective Pandas I Matt Harrison I PyData Salt Lake City Meetup
▶︎

Effective Pandas I Matt Harrison I PyData Salt Lake City Meetup

Making Moves with Arrow Data: Introducing Arrow Database Connectivity (ADBC) | Voltron Data
▶︎

Making Moves with Arrow Data: Introducing Arrow Database Connectivity (ADBC) | Voltron Data

G-Research Distinguished Speaker Series: Apache Arrow - High Performance Columnar Data Framework
▶︎

G-Research Distinguished Speaker Series: Apache Arrow - High Performance Columnar Data Framework

Jake VanderPlas - Performance Python: Seven Strategies for Optimizing Your Numerical Code
▶︎

Jake VanderPlas - Performance Python: Seven Strategies for Optimizing Your Numerical Code

Accelerating Scientific Workloads with Numba - Siu Kwan Lam
▶︎

Accelerating Scientific Workloads with Numba - Siu Kwan Lam

The PyArrow revolution in Pandas — Reuven M. Lerner
▶︎

The PyArrow revolution in Pandas — Reuven M. Lerner

Stephen Simmons - Pandas from the Inside / "Big Pandas"
▶︎

Stephen Simmons - Pandas from the Inside / "Big Pandas"

The columnar roadmap: Apache Parquet and Apache Arrow
▶︎

The columnar roadmap: Apache Parquet and Apache Arrow

Apache Arrow: A Cross-Language Development Platform for in Memory Data | SciPy 2018 | Wes McKinney
▶︎

Apache Arrow: A Cross-Language Development Platform for in Memory Data | SciPy 2018 | Wes McKinney

"Apache Arrow and the Future of Data Frames" with Wes McKinney
▶︎

"Apache Arrow and the Future of Data Frames" with Wes McKinney

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan
▶︎

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Make Python code 1000x Faster with Numba
▶︎

Make Python code 1000x Faster with Numba

Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio
▶︎

Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio

A 101 in time series analytics with Apache Arrow, Pandas and Parquet — Zoe Steinkamp (PyBay 2024)
▶︎

A 101 in time series analytics with Apache Arrow, Pandas and Parquet — Zoe Steinkamp (PyBay 2024)

Jake VanderPlas - How to Think about Data Visualization - PyCon 2019
▶︎

Jake VanderPlas - How to Think about Data Visualization - PyCon 2019

Sofia Heisler   No More Sad Pandas Optimizing Pandas Code for Speed and Efficiency   PyCon 2017
▶︎

Sofia Heisler No More Sad Pandas Optimizing Pandas Code for Speed and Efficiency PyCon 2017

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker
▶︎

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

PyData Tel Aviv Meetup: Diving into Pandas is faster than reinventing it - Dean Langsam
▶︎

PyData Tel Aviv Meetup: Diving into Pandas is faster than reinventing it - Dean Langsam

Something is jamming GPS over Europe. Here's what we found
▶︎

Something is jamming GPS over Europe. Here's what we found