Simplifying Data Management with DAG Factory - Airflow Summit 2025
Presented by Katarzyna Kałek, Jakub Orłowski at Airflow Summit 2025. At OLX, we connect millions of people daily through our online marketplace while relying on robust data pipelines. In this talk, we explore how the DAG Factory concept elevates data governance, lineage, and discovery by centralizing operator logic and restricting direct DAG creation. This approach enforces code quality, optimizes resources, maintains infrastructure hygiene and enables smooth version upgrades. We then leverage consistent naming conventions in Airflow to build targeted namespaces, aligning teams with global policies while preserving autonomy. Integrating external tools like AWS Lake Formation and Open Metadata further unifies governance, making it straightforward to manage and secure data. This is critical when handling hundreds or even thousands of active DAGs. If the idea of storing 1,600 pipelines in one folder seems overwhelming, join us to learn how the DAG Factory concept simplifies pipeline management. We’ll also share insights from OLX, highlighting how thoughtful design fosters oversight, efficiency, and discoverability across diverse use cases.

Is this AI's moment of truth? | BBC News

Fall asleep while I build a zoo

The New Way of Scheduling DAGs in Airflow with Datasets

Apache Airflow One Shot- Building End To End ETL Pipeline Using AirFlow And Astro

From Fragmentation to Foundation Building Enterprise Ready Data Contracts That Scale

Using Airflow for Real-Time Data Processing at Scale: Architecture, Challenges & Wins

From Complexity to Simplicity with TaskHarbor: Trendyol's Path to a Unified Orchestration Platform

"10 steps of project modernization" by Jelena Černyšova - QLTY PULSE 2026

How Instagram Scaled Postgres to 2 Billion Users

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Airflow DAG Factory: Create DAGs dynamically with YAML

What is Databricks? The Story Behind the Modern Data Platform (Visual Explanation)

Something is jamming GPS over Europe. Here's what we found

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

I Think They Are Lying To You

How to Run Talend Tasks Using Apache Airflow and Create a Talend Operator!

Is RAG Still Needed? Choosing the Best Approach for LLMs

Lessons learned from migrating to Airflow @ LinkedIn - Airflow Summit 2025

