Spark vs Dask | Why We Switched from Spark to Dask | Sébastien Arnaud at Steppingblocks | June 2022
Learn more at https://bit.ly/3oTtMIN Spark vs Dask for big data analytics...which should you pick? Steppingblocks is a big data analytics company that provides workforce and education analytics on over 130 million individuals in the U.S to universities, employers, and government agencies. Join us for a conversation with Sébastien Arnaud, Chief Technology & Data Officer at Steppingblocks, and discover how he handled an ever-growing amount of data with a small but mighty team. Wolfe Nelson, Senior Data Engineer at Steppingblocks, will join him to demo a few essential techniques they learned along the way on their Dask journey. In this webinar, you will learn about Who is Steppingblocks (https://www.steppingblocks.com)? Learn about their unique big data offering to clients Scaling in post-pandemic times and how they managed to scale quickly in a difficult hiring market Spark vs Dask: What technical reasons led them to switch away from Spark/Databricks to Dask/Coiled Some more reading on this topic: Blog - Spark vs. Dask vs. Ray https://www.coiled.io/blog/spark-vs-d... Other videos: • Databricks vs. Dask and Coiled Key Moments 00:00:00 Intro 00:01:06 Who is Steppingblocks? 00:03:33 Steppingblocks Data Process 00:05:01 Our Data Journey So Far 00:06:53 Why We Moved from Spark vs Dask 00:09:42 The Migration to Dask & Coiled 00:12:36 Pain Points 00:14:34 Some Migration Results & Metrics 00:16:47 Our Journey Since Migrating to Dask 00:22:45 Tips, Tricks & Challenges 00:29:33 Sample Notebook 00:33:12 Q & A --- Scale Your Python Workloads with Dask and Coiled. Coiled is a Dask company. With Coiled's rock-solid infrastructure, you can quickly and securely create Dask clusters in your cloud account. Learn more about Coiled and get started for free https://coiled.io/start More content on our blog: https://coiled.io/blog

Spark vs Dask | Why We Use Dask Over Spark for Earth Observation | Basile Goussard from NetCarbon

Distributed Computing | The State of Distributed Computing Webinar | Matt Rocklin & Peter Wang

PCI DSS Basics: Everything You Need to Get PCI DSS Certified

How AI will change software engineering – with Martin Fowler

Spark, Dask, DuckDB, Polars: TPC-H Benchmarks at Scale

SOC 2: A Simple Intro to SOC 2 Certification for Companies Getting Certified for the First Time

AI for Business: The Opportunity of a Lifetime - Rob Thomas, IBM - Georgia Technology Summit 2023

What is Databricks? The Story Behind the Modern Data Platform (Visual Explanation)

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

How to Add ISO 42001 to Your ISO 27001 Program with risk3sixty + Schellman

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Coiled Overview

Impact by Design: Care Gaps & TPO Methodology in Medical Affairs

Velox IO Nimble Index & Open Table Format Support Xiaoxuan Meng, Meta

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Trump Gets Booed & Falls Asleep During NBA Finals, Claims War is Almost Over & Goodbye Spencer Pratt

Schedule Python Jobs with Prefect and Coiled

How Netflix Uses Java - 2026 Edition

Designing Data-intensive Applications with Martin Kleppmann

