Airflow in Practice Stop Worrying Start Loving DAGs - Sarah Schattschneider

This talk was presented at PyBay2019 - 4th annual Bay Area Regional Python conference. See pybay.com for more details about PyBay and click SHOW MORE for more information about this talk. Description Heard of Apache Airflow? Do you work with Airflow or want to work with Airflow? Ever wonder how to better test Airflow? Have you considered all data workflow use cases for Airflow? Come be reminded of key concepts and then we will dive into Airflow’s value add, common use cases, and best practices. Some use cases: Extract Transform Load (ETL) jobs, snapshot databases, and ML feature extraction. Abstract Background - What is Airflow? Explain Cron and how it compares to Airflow High level explain the key concepts of Airflow * Direct Acyclic Graph (DAG) - nodes are tasks and edges are dependency structure * Third Party Integrations (Slack, Google Cloud Platform, AWS, etc) * Airflow Hooks & Operators * What is Airflow? * Programmatically author workflows * Stateful scheduling * Rich CLI and UI that make development easy * Logging, monitoring, and alerting * Modularity lends itself well to testability * Solves common problems with batch processing * Open sourced by AirBnB in 2015 Evaluating Airflow * What value does Airflow add? * Retries task elegantly, which handles transient network errors * Alerts on failure (email or slack) * Can re-run specific tasks in a large DAG * Support distributed execution * Great OSS community and momentum * Can be hosted on AWS, Azure, or GCP * Managed options for Airflow - AWS Glue, GCP Cloud Composer, or Azure Data Factory Does Airflow Have an Ugly Side? How to Overcome Challenges? Upgrades can be more challenging when you have custom hooks and operators env vars vs variables vs xcoms Common Use Cases Extract Transform Load (ETL) Jobs * Airflow enables moving data and transforming data very easily * Can create custom Hooks for Third Party APIs Efficiently Snapshot Databases Create Test Environments for QA ML Feature Extraction Best Practices Testing * Unit tests from lib functions * Acceptance tests to run list_dags Doc MD for the DAG * Contain Points of Contact * What remediation/escalation steps should the on-call person take when this DAG fails? Exciting New/New(ish) Features * Lineage * Role Based Access Control * Airflow 2.0 Improvements Original slides: https://t.ly/xYJk9 About the speaker Software Engineer at Blue Apron on the Data Engineering team. Work daily using Python on our data pipeline. Excited by how Python is transforming Data Engineering. Sponsor Acknowledgement This and other PyBay2019 videos are via the help of our media partner AlphaVoice (https://www.alphavoice.io/)! #pybay #pybay2019 #python #python3 #gdb

Airflow: Automating ETLs for a Data Warehouse, Natarajan Chakrapani, SF Python July 2018
▶︎

Airflow: Automating ETLs for a Data Warehouse, Natarajan Chakrapani, SF Python July 2018

Augmented, accelerated, autonomized: How Vanguard Is embedding AI across the product lifecycle
▶︎

Augmented, accelerated, autonomized: How Vanguard Is embedding AI across the product lifecycle

Airflow Tutorial for Beginners - Full Course in 2 Hours 2022
▶︎

Airflow Tutorial for Beginners - Full Course in 2 Hours 2022

Airflow XCom for Beginners - All you have to know in 10 mins
▶︎

Airflow XCom for Beginners - All you have to know in 10 mins

The Newcomer's Guide to Airflow's Architecture
▶︎

The Newcomer's Guide to Airflow's Architecture

Complete Terraform Course - From BEGINNER to PRO! (Learn Infrastructure as Code)
▶︎

Complete Terraform Course - From BEGINNER to PRO! (Learn Infrastructure as Code)

No, seriously, why don't we use better testing tools — Zac Hatfield Dodds (PyBay 2025)
▶︎

No, seriously, why don't we use better testing tools — Zac Hatfield Dodds (PyBay 2025)

Testing Airflow workflows - ensuring your DAGs work before going into production
▶︎

Testing Airflow workflows - ensuring your DAGs work before going into production

Airflow on Kubernetes - Scaling DAG Workflows | Daniel Imberman, Seth Edwards @ PyBay2018
▶︎

Airflow on Kubernetes - Scaling DAG Workflows | Daniel Imberman, Seth Edwards @ PyBay2018

Data Engineering Principles - Build frameworks not pipelines - Gatis Seja
▶︎

Data Engineering Principles - Build frameworks not pipelines - Gatis Seja

Rob Story | Data Engineering Architecture at Simple
▶︎

Rob Story | Data Engineering Architecture at Simple

Learn Apache Airflow in 10 Minutes | High-Paying Skills for Data Engineers
▶︎

Learn Apache Airflow in 10 Minutes | High-Paying Skills for Data Engineers

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source
▶︎

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Airflow tutorial 4: Writing your first pipeline
▶︎

Airflow tutorial 4: Writing your first pipeline

Building (Better) Data Pipelines with Apache Airflow
▶︎

Building (Better) Data Pipelines with Apache Airflow

Just because AI can write your tests  - should it? — Pamela Fox (PyBay 2025)
▶︎

Just because AI can write your tests - should it? — Pamela Fox (PyBay 2025)

As We May Program - Peter Norvig
▶︎

As We May Program - Peter Norvig

Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | Databricks
▶︎

Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | Databricks

Data Analytics for Beginners | Data Analytics Training | Data Analytics Course | Intellipaat
▶︎

Data Analytics for Beginners | Data Analytics Training | Data Analytics Course | Intellipaat

Michał Karzyński - Developing elegant workflows in Python code with Apache Airflow
▶︎

Michał Karzyński - Developing elegant workflows in Python code with Apache Airflow