Scaling Pandas Using Dask: How to Avoid All My Mistakes | Krishan Bhasin | Dask Summit 2021
Dask is a Python package that provides advanced parallelism for analytics, enabling performance at scale for the tools you love. People think it’s magic - drop it in and it scales. This will mostly work, but it will not scale well! We would like to share what we’ve learned about using Dask to scale dataframe and computations, to avoid you making the same mistakes. This is a talk about scaling Pandas using Dask by Krishan Bhasin at Dask Summit 2021. What is the Dask Summit? The Dask Distributed Summit is where users, contributors, and newcomers can share experiences to learn from one another and grow together. The Dask Distributed Summit provides content, information, and learning opportunities for attendees of all levels of Dask familiarity and expertise. What is Dask? Dask is a free and open-source library for parallel computing in Python. Dask is a community project maintained by developers and organizations. Share your feedback with us on this scaling Pandas talk and let us know: Did you find this talk on scaling Pandas using Dask helpful? What is your experience with scaling Pandas? Learn more at summit.dask.org and dask.org KEY MOMENTS 00:00:00 Scaling Pandas Using Dask 00:00:16 About Krishan Bhasin 00:00:59 Why This Talk? 00:01:38 Overview of Session 00:01:59 Dask Recap 00:03:37 A Closer Look at Dask Dataframe 00:04:33 A Closer Look at Distributed Scheduler 00:05:41 Submitting Work to a Cluster 00:08:05 Dask Learnings Part 0 - Just Don't 00:09:16 Dask Learnings Part 1 - You Can't Improve What You Can't See 00:10:34 Use The Dashboard 00:15:01 Dask Learnings Part 2 - Understand and Leverage Dask's Principles 00:23:39 Dask Learnings Part 3 - If It's Broke or Missing, Fix It! 00:29:10 Q & A

What I Love & Hate About Dask | Matt Rocklin | ODSC 2022

Learning Pandas for Data Analysis? Start Here.

Talk: Matthew Rocklin - Deploying Python at Scale with Dask

Effective Pandas I Matt Harrison I PyData Salt Lake City Meetup

Standardizing the Model Development & Approval Process | Joe Wolfe & Ryan Soley | Dask Summit 2021

Pandas Limitations - Pandas vs Dask vs PySpark - DataMites Courses

1000x faster data manipulation: vectorizing with Pandas and Numpy

Intro to Python Dask: Easy Big Data Analytics with Pandas!

Dask in 8 Minutes: An Introduction

NumPy vs Pandas

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

How to work with big data files (5gb+) in Python Pandas!

Similarities and differences of Spark, Dask, and Ray by Holden Karau

Much Faster Pandas with cuDF GPU Processing - CPU vs GPU Speed Benchmarks

Tom Augspurger: Scalable Machine Learning with Dask | PyData New York 2019

The SpaceX IPO... It's Worse Than You Think

How AI agents & Claude skills work (Clearly Explained)

Dask in 15 Minutes | Machine Learning & Data Science Open-source Spotlight #5

Why AI Agents are either the best or worst thing we’ve ever built

