Advancing Spark - Understanding Low Shuffle Merge

Back in Databricks Runtime 9.0 we saw the introduction of a preview "Low Shuffle Merge" feature, but it seemed to go fairly unnoticed. In DBR 10.4, it's now enabled by default and a fully GA part of the platform... but what does it actually do? In this video, Simon walks through the theory of low shuffle merge, and what you should expect to see happening to both your runtime executions, but also the data layout before and after the change. Make no mistake, it's a real speed boost to many common patterns, so use it if you can! For more info on Low Shuffle Merge, see the docs over at: https://docs.microsoft.com/en-us/azur... And as always, get in touch with Advancing Analytics if you need help on your Lakehouse journey

Pass PROFESSIONAL Databricks Certified Data Engineer Exam

Pass PROFESSIONAL Databricks Certified Data Engineer Exam

Advancing Spark - Databricks Cluster Metrics! No More Ganglia?

Advancing Spark - Databricks Cluster Metrics! No More Ganglia?

The Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)

The Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)

Shuffle Partition Spark Optimization: 10x Faster!

Shuffle Partition Spark Optimization: 10x Faster!

A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)

A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)

Accelerating Data Ingestion with Databricks Autoloader

Accelerating Data Ingestion with Databricks Autoloader

Advancing Spark - Identity Columns in Delta

Advancing Spark - Identity Columns in Delta

Apache Spark Architecture - EXPLAINED!

Apache Spark Architecture - EXPLAINED!

Advancing Spark - Delta Live Tables Generally Available!

Advancing Spark - Delta Live Tables Generally Available!

Spark Basics | Shuffling

Spark Basics | Shuffling

Optimizing MERGE Performance using Liquid Clustering

Optimizing MERGE Performance using Liquid Clustering

Shuffling: What it is and why it's important

Shuffling: What it is and why it's important

Spark performance optimization Part1 | How to do performance optimization in spark

Spark performance optimization Part1 | How to do performance optimization in spark

The AI Take Over Has Completely Backfired and I Can't Be Happier

The AI Take Over Has Completely Backfired and I Can't Be Happier

Making Apache Spark™ Better with Delta Lake

Making Apache Spark™ Better with Delta Lake

Row Context in DAX

Row Context in DAX

Advancing Spark - Databricks Delta Change Feed

Advancing Spark - Databricks Delta Change Feed

How to use Microsoft Power Query

How to use Microsoft Power Query

Lessons From the Field: Applying Best Practices to Your Apache Spark Applications - Silvio Fiorito

Lessons From the Field: Applying Best Practices to Your Apache Spark Applications - Silvio Fiorito

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks