Peter Owlett - Lessons from 6 months of using Luigi in production

PyData London 2016 At Deliveroo we've built our data plumbing from the ground up using Luigi to manage our data workflows. In this talk I'll be walking through our experiences using Luigi scaling from a few simple jobs to a complex, production grade system. This talk is mostly about building robust data pipelines, but is also a little bit about why it's better to be woken up by your cat than by the server alarm. In the beginning, there was Cron. We had one job, it ran at 1AM, and it was good. Then we added another job, and to make them run one after the other, we used Luigi, which says "This can only run when this is finished". Then we added another ~500 jobs, long running scikitlearn computes, external API dependencies, a business reporting systems with 2000+ reports and 400+ users and a scheduling system with 5000+ users. This is when things got interesting. This is the story of building the data systems at Deliveroo. This is not a talk about Big Data, cutting edge algorithms or new open source technology. Rather, this is a talk about coping with complexity in a rapidly changing landscape. I'll start from the beginning, giving a brief overview of what Luigi is and why we decided to roll with it. The body of the talk will be about the challenges we faced as our company grew in size and complexity, the solutions that worked (and those that didn't), and what we know now that we didn't know then. I'll cover a bit of the luigi syntax itself, but mostly I'll focus on the things we did around luigi that made it work for us; how (not) to design pipelines, how to test them, how to manage issues gracefully and how to detect problems in advance. By attending this session you'll learn: Why DAG based ETL systems are fundamentally useful What to think about when designing your DAG What to implement early to save you pain later on Slides available here: https://speakerdeck.com/peteowlett/le... 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Data Engineering Principles - Build frameworks not pipelines - Gatis Seja
▶︎

Data Engineering Principles - Build frameworks not pipelines - Gatis Seja

Laura Lorenz | How I learned to time travel, or, data pipelining and scheduling with Airflow
▶︎

Laura Lorenz | How I learned to time travel, or, data pipelining and scheduling with Airflow

Machine Learning Pipeline using Luigi and Scikit Learn - PyConSG 2016
▶︎

Machine Learning Pipeline using Luigi and Scikit Learn - PyConSG 2016

Al Sweigart   Yes, It's Time to Learn Regular Expressions   PyCon 2017
▶︎

Al Sweigart Yes, It's Time to Learn Regular Expressions PyCon 2017

Functional Data Engineering - A Set of Best Practices | Lyft
▶︎

Functional Data Engineering - A Set of Best Practices | Lyft

High Quality, High Performance Clustering with HDBSCAN | SciPy 2016 | Leland McInnes
▶︎

High Quality, High Performance Clustering with HDBSCAN | SciPy 2016 | Leland McInnes

Data Pipelines - Comparing Airflow and Luigi - Orr Shilon & Alex Levin - PyCon Israel 2019
▶︎

Data Pipelines - Comparing Airflow and Luigi - Orr Shilon & Alex Levin - PyCon Israel 2019

Get Rid of Traditional ETL, Move to Spark! (Bas Geerdink)
▶︎

Get Rid of Traditional ETL, Move to Spark! (Bas Geerdink)

Building Data Pipelines Using Python Tutorial | Data Pipelines Using Python Course
▶︎

Building Data Pipelines Using Python Tutorial | Data Pipelines Using Python Course

PyCon.DE 2017 Alexander Bauer - Large-scale machine learning pipelines using Luigi,...n
▶︎

PyCon.DE 2017 Alexander Bauer - Large-scale machine learning pipelines using Luigi,...n

Getting Started with Prefect | Task Orchestration & Data Workflows
▶︎

Getting Started with Prefect | Task Orchestration & Data Workflows

5 steps to designing the life you want  | Bill Burnett | TEDxStanford
▶︎

5 steps to designing the life you want | Bill Burnett | TEDxStanford

6. Monte Carlo Simulation
▶︎

6. Monte Carlo Simulation

Michał Karzyński - Developing elegant workflows in Python code with Apache Airflow
▶︎

Michał Karzyński - Developing elegant workflows in Python code with Apache Airflow

Brandon Rhodes - Pandas From The Ground Up - PyCon 2015
▶︎

Brandon Rhodes - Pandas From The Ground Up - PyCon 2015

Ярослав Черепанов - Построение пайплайнов обработки данных с использованием Luigi
▶︎

Ярослав Черепанов - Построение пайплайнов обработки данных с использованием Luigi

Harvard Professor Explains The Rules of Writing — Steven Pinker
▶︎

Harvard Professor Explains The Rules of Writing — Steven Pinker

David Beazley - Reinventing the Parser Generator  - PyCon 2018
▶︎

David Beazley - Reinventing the Parser Generator - PyCon 2018

Marco Bonzanini - Building Data Pipelines in Python
▶︎

Marco Bonzanini - Building Data Pipelines in Python

Aaron Knight   Build a data pipeline with Luigi   PyCon 2017
▶︎

Aaron Knight Build a data pipeline with Luigi PyCon 2017