Rob Story: Python Data Bikeshed
PyData Seattle 2015 The PyData ecosystem is growing rapidly, with existing tools maturing and exciting new tools appearing on a regular basis. This talk will examine the crowded PyData ecosystem and bring some clarity to which Python data tool is the right one to reach for on any given analysis. It will focus on use-cases for pure python, toolz, Numpy, Pandas, Blaze, xray, bcolz, Dask, and Spark. The PyData ecosystem can be a bit confusing for those new to Python, or even experienced programmers moving to Python for its excellent data analysis capabilities. How do you know which tool to reach for on any given project? What tools work best for my data of size FooBar in data store FizzBuzz? This talk will explore the Python data toolchain from bottom to top, with a focus on what tools work best based on both data locality and analysis velocity. Think of your data pipeline and storage as a city, and your data tools as a shed full of bikes. What bike works best for which trip? When should you use pure Python (the fixie) to perform your analysis? How do Pandas (the geared commuter) and Blaze (the tandem) work together? Where does Spark (the fat tire bike) fit into all of this? This talk seeks to use questionable bike analogies to provide less-questionable look at the crowded PyData ecosystem and bring some clarity to which Python data tool is the right one to reach for on any given analysis. It will touch on pure python, toolz, Numpy, Pandas, Blaze, xray, bcolz, Dask, and Spark, with a focus on the use-cases for each one. Finally, we’ll talk about which library you should use to paint the bikeshed. Materials available here: https://github.com/wrobstory/pydatase... 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Margaret Mahan - Store and manage data effortlessly with HDF5

Trey Causey: Testing for Data Scientists

Brandon Rhodes: All Your Ducks In A Row: Data Structures in the Std Lib and Beyond - PyCon 2014

Andrew Montalenti: Beating Python's GIL to Max Out Your CPUs

Jeffrey Tratner: Pandas Under The Hood: Peeking behind the scenes of a high performance data analys

Thinking about Concurrency, Raymond Hettinger, Python core developer

The Clean Architecture in Python

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Web Scraping Using Python For Beginners and File Handling in Python | Python Web Scraping

Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

Steve Dower: What's coming in Python 3.5 and why you should be excited

Łukasz Langa - Thinking In Coroutines - PyCon 2016

Kafka Crash Course - Hands-On Project

Tom Augspurger: Pandas: .head() to .tail()

Dask Parallel and Distributed Computing | SciPy 2016 | Matthew Rocklin

David Beazley - Python Concurrency From the Ground Up: LIVE! - PyCon 2015

Bugra Akyildiz: Trend Estimation in Time Series Signals

Holden Karau: A brief introduction to Distributed Computing with PySpark

Loop like a native: while, for, iterators, generators

