Extending Pandas using Apache Arrow and Numba - Uwe L Korn
PyData Berlin 2018 With the latest release of Pandas the ability to extend it with custom dtypes was introduced. Using Apache Arrow as the in-memory storage and Numba for fast, vectorized computations on these memory regions, it is possible to extend Pandas in pure Python while achieving the same performance of the built-in types. In the talk we implement a native string type as an example. Slides: https://pydata.org/berlin2018/proposa... --- www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 0:00 - Introduction 0:41 - Speaker's background 1:53 - Introduction to Pandas and NumPy 3:05 - Shortcomings of Pandas 8:55 - Extending Pandas with ExtensionArrays 12:16 - Apache Arrow for Data Storage 15:50 - Numba for computing 21:03 - Putting all together 28:10 - Closing 28:45 - Questions S/o to https://github.com/keckelt for the video timestamps! Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Demystifying pandas internals - Marc Garcia

Effective Pandas I Matt Harrison I PyData Salt Lake City Meetup

Making Moves with Arrow Data: Introducing Arrow Database Connectivity (ADBC) | Voltron Data

G-Research Distinguished Speaker Series: Apache Arrow - High Performance Columnar Data Framework

Jake VanderPlas - Performance Python: Seven Strategies for Optimizing Your Numerical Code

Accelerating Scientific Workloads with Numba - Siu Kwan Lam

The PyArrow revolution in Pandas — Reuven M. Lerner

Stephen Simmons - Pandas from the Inside / "Big Pandas"

The columnar roadmap: Apache Parquet and Apache Arrow

Apache Arrow: A Cross-Language Development Platform for in Memory Data | SciPy 2018 | Wes McKinney

"Apache Arrow and the Future of Data Frames" with Wes McKinney

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Make Python code 1000x Faster with Numba

Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio

A 101 in time series analytics with Apache Arrow, Pandas and Parquet — Zoe Steinkamp (PyBay 2024)

Jake VanderPlas - How to Think about Data Visualization - PyCon 2019

Sofia Heisler No More Sad Pandas Optimizing Pandas Code for Speed and Efficiency PyCon 2017

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

PyData Tel Aviv Meetup: Diving into Pandas is faster than reinventing it - Dean Langsam

