Processing Large Geospatial Datasets with Dask & Xarray - Patrick Hoefler
Geospatial datasets are growing in size, often exceeding 100TB and reaching into Petabyte scale. Many of these datasets are publicly available, providing a great resource for analysis, but working with them requires increasingly large computational resources and a diverse set of tools. We will start by briefly introducing Dask and Xarray, which form the backbone of the geospatial stack in Python. Using the ERA5 dataset as a case study, we will demonstrate how Xarray can be used to explore large-scale climate data effectively from your local laptop. Building on this foundation, we will delve into recent advancements in Dask Array. Originally designed as a parallel NumPy API, Dask Array was used to handle much larger datasets over the last few years. We’ll explore the latest developments in Dask and Xarray that continue to expand the scalability and capabilities of these tools to catch up with the scale requirements of modern datasets. This discussion will highlight improvements in ease of use, scalability, and performance. Additionally, we’ll present the first-ever set of geospatial benchmarks, collected earlier in 2024 from the community. These benchmarks provide a clear illustration of the scale at which Xarray and Dask are required to operate. Finally, we’ll offer a peak behind the scenes of an ongoing project aimed at building the first ever query optimizer for large scale array computations.

Ferrari is having its Jaguar moment. Why?

Simple Code, High Performance

Metaclasses Demystified - Jason C. McDonald

Supercharge your Python library using AST parsing - Adam Glustein

Lab 10 - Plotting Distributions and Perfecting Visualisations

Hendrik Makait - Dask ❤️ Xarray: Geoscience at Massive Scale | PyData Global 2024

The chaotic locals() and how we fixed it - Tian Gao

Why Aliens Would NEVER Invade Africa

Something is jamming GPS over Europe. Here's what we found

Program Your Own Computer in Python - Glyph

The Insane Genius of a Formula 1 Gearbox

Dask Tutorial | Intro to Dask | Parallelize Python Code with Dask Delayed | Module Five

Building the PERFECT Linux PC with Linus Torvalds

But what is quantum computing? (Grover's Algorithm)

Keeping up with Python: what makes upgrades hard, and what can we do about it - Jason Fried

What's New in the Linux Kernel... from Python - Geoffrey Thomas

Israel Entire Cities Are Plunged Into Chaos. Millions Of Israelis Are Fleeing Tel Aviv|Jeffrey Sachs

22 - DuckDB Internals (CMU Advanced Databases / Spring 2023)

Unlocking the power of Xarray and Dask (Part 1 of 2)

