Fine Tuning and Enhancing Performance of Apache Spark Jobs

Apache Spark defaults provide decent performance for large data sets but leave room for significant performance gains if able to tune parameters based on resources and job. We’ll dive into some best practices extracted from solving real world problems, and steps taken as we added additional resources. garbage collector selection, serialization, tweaking number of workers/executors, partitioning data, looking at skew, partition sizes, scheduling pool, fairscheduler, Java heap parameters. Reading sparkui execution dag to identify bottlenecks and solutions, optimizing joins, partition. By spark sql for rollups best practices to avoid if possible. About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Read more here: https://databricks.com/product/unifie... Connect with us: Website: https://databricks.com Facebook: / databricksinc Twitter: / databricks LinkedIn: / databricks Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

What is Spark? (Visual Explanation)

What is Spark? (Visual Explanation)

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

Spark Declarative Pipelines (SDP) Explained in Under 20 Minutes

Spark Declarative Pipelines (SDP) Explained in Under 20 Minutes

Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Methods with Luca Canali

Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Methods with Luca Canali

Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

Spark performance optimization Part1 | How to do performance optimization in spark

Spark performance optimization Part1 | How to do performance optimization in spark

Understanding Databricks & Apache Spark Performance Tuning: Lesson 01 - Spark Architecture

Understanding Databricks & Apache Spark Performance Tuning: Lesson 01 - Spark Architecture

Making Sense of Spark Performance - Kay Ousterhout (UC Berkeley)

Making Sense of Spark Performance - Kay Ousterhout (UC Berkeley)

Apache Spark Architecture - EXPLAINED!

Apache Spark Architecture - EXPLAINED!

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Tuning and Debugging Apache Spark

Tuning and Debugging Apache Spark

How to Read Spark DAGs | Rock the JVM

How to Read Spark DAGs | Rock the JVM

Deep Dive: Apache Spark Memory Management

Deep Dive: Apache Spark Memory Management

Tricks of the Trade to be an Apache Spark Rock Star - Ted Malaska

Tricks of the Trade to be an Apache Spark Rock Star - Ted Malaska

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

The ONLY PySpark Tutorial You Will Ever Need.

The ONLY PySpark Tutorial You Will Ever Need.

Hive Bucketing in Apache Spark - Tejas Patil

Hive Bucketing in Apache Spark - Tejas Patil

A Developer’s View into Spark's Memory Model - Wenchen Fan

A Developer’s View into Spark's Memory Model - Wenchen Fan

Something is jamming GPS over Europe. Here's what we found

Something is jamming GPS over Europe. Here's what we found