A Deep Dive into the Catalyst Optimizer (Herman van Hovell)

Catalyst is becoming one of the most important components in Apache Spark, as it underpins all the major new APIs in Spark 2.0, from DataFrames, Datasets, to streaming. At its core, Catalyst is a general library for manipulating trees. Based on this library, we have built a modular compiler frontend for Spark, including a query analyzer, optimizer, and an execution planner. In this talk, I will introduce the core concepts of catalyst by working through a few examples. I will also show how new and upcomming features are implemented using Catalyst. The audience will walk away with a deeper understanding of how Spark analyzes, optimizes and plans a user’s query.

A Deep Dive into the Catalyst Optimizer Hands on Lab (Herman van Hovell)
▶︎

A Deep Dive into the Catalyst Optimizer Hands on Lab (Herman van Hovell)

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
▶︎

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

How to Read Spark DAGs | Rock the JVM
▶︎

How to Read Spark DAGs | Rock the JVM

Deep Dive: Apache Spark Memory Management
▶︎

Deep Dive: Apache Spark Memory Management

Deep Dive Into Catalyst: Apache Spark 2 0'S Optimizer
▶︎

Deep Dive Into Catalyst: Apache Spark 2 0'S Optimizer

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming  - by Michael Armbrust
▶︎

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust

A Deep Dive into Query Execution Engine of Spark SQL - Maryann Xue
▶︎

A Deep Dive into Query Execution Engine of Spark SQL - Maryann Xue

Broadcast joins in Apache Spark | Rock the JVM
▶︎

Broadcast joins in Apache Spark | Rock the JVM

Deep Dive into Project Tungsten  Bringing Spark Closer to Bare Metal -Josh Rosen (Databricks)
▶︎

Deep Dive into Project Tungsten Bringing Spark Closer to Bare Metal -Josh Rosen (Databricks)

Deep Dive into Monitoring Spark Applications Using Web UI and SparkListeners (Jacek Laskowski)
▶︎

Deep Dive into Monitoring Spark Applications Using Web UI and SparkListeners (Jacek Laskowski)

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker
▶︎

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Casey Muratori – The Big OOPs: Anatomy of a Thirty-five-year Mistake – BSC 2025
▶︎

Casey Muratori – The Big OOPs: Anatomy of a Thirty-five-year Mistake – BSC 2025

A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)
▶︎

A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)

The Apache Spark™ Cost-Based Optimizer
▶︎

The Apache Spark™ Cost-Based Optimizer

What SpaceX, Anthropic and OpenAI’s IPOs mean for investors
▶︎

What SpaceX, Anthropic and OpenAI’s IPOs mean for investors

SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal
▶︎

SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal

Understanding Query Plans and Spark UIs - Xiao Li Databricks
▶︎

Understanding Query Plans and Spark UIs - Xiao Li Databricks

Introduction to AmpLab Spark Internals
▶︎

Introduction to AmpLab Spark Internals

Something is jamming GPS over Europe. Here's what we found
▶︎

Something is jamming GPS over Europe. Here's what we found

From Query Plan to Performance: Supercharging your Apache Spark Queries using the Spark UI SQL Tab
▶︎

From Query Plan to Performance: Supercharging your Apache Spark Queries using the Spark UI SQL Tab