DuckDB & Python | End-To-End Data Engineering Project (1/3)
In this video @mehdio goes over a fun end-to-end data engineering project : get usage insights from a python library using Python, SQL and DuckDB! This is the first part of the series. Check links below to learn about transformation and dashboarding using DuckDB ! š„ Part 2 of the end-to-end data engineering project : Ā Ā Ā ā¢Ā DuckDBĀ &Ā dbtĀ |Ā End-To-EndĀ DataĀ Engineering...Ā Ā š„ Part 3 : Ā Ā Ā ā¢Ā DuckDBĀ &Ā datavizĀ |Ā End-To-EndĀ DataĀ Enginee...Ā Ā āļøš¦ Start using DuckDB in the Cloud for FREE with MotherDuck : https://hubs.la/Q02QnFR40 š Resources Github Repo of the tutorial : https://github.com/mehd-io/pypi-duck-... BigQuery performance issue with certain libraries: https://github.com/googleapis/python-... DuckDB for beginner video : Ā Ā Ā ā¢Ā DuckDBĀ TutorialĀ ForĀ BeginnersĀ InĀ 12Ā minĀ Ā ā”ļø Follow Us LinkedIn: Ā Ā /Ā motherduckĀ Ā Twitter : Ā Ā /Ā motherduckĀ Ā Blog: https://motherduck.com/blog/ 0:00 Intro 1:06 Architecture 3:13 Ingestion Pipeline Python & DuckDB 41:08 Wrapping up & what's next #duckdb #dataengineering #sql #python Learn how to build a complete, end-to-end data engineering project using Python, SQL, and DuckDB. This video guides you through creating a robust Python data pipeline to ingest and analyze PyPI download statistics, providing valuable insights into any Python library's adoption. We'll cover the full architecture, from sourcing raw data in Google BigQuery to preparing it for transformation and visualization, making this a perfect tutorial for anyone looking to apply data engineering best practices in a real-world scenario. We kick off the data ingestion phase by demonstrating how to efficiently query massive public datasets in BigQuery without incurring high costs, focusing on partition filtering for optimization. You'll learn how to set up a professional development environment using Docker and VS Code dev containers, and we'll install all the necessary libraries, including the Google Cloud SDK, Pandas for data manipulation, and of course, the DuckDB Python package. This setup ensures your data pipeline is reproducible and isolated. Discover Python data pipeline best practices as we structure our code for maintainability and robustness. We use Pydantic to define clear data models for our job parameters and, critically, for schema validation against the source data from BigQuery. This prevents data quality issues from breaking your pipeline downstream. We also leverage the Fire library to automatically generate a powerful and flexible command-line interface (CLI) from our Pydantic models, making the pipeline easy to parameterize and run. See how DuckDB acts as the powerful core of our ingestion logic. After fetching data into a Pandas DataFrame, we seamlessly load it into an in-memory DuckDB instance. This simplifies complex tasks like creating reliable test fixtures for schema validation and exporting the validated data to multiple destinations. Learn the simple SQL commands to write data locally, push to a data lake on AWS S3 with efficient Hive partitioning, or load it directly into MotherDuck for a serverless cloud data warehouse experience. By the end of this tutorial, you'll have built a fully functional raw data ingestion pipeline, ready for the next step. This video sets the foundation for our series, where we'll next use DBT and DuckDB to build the transformation layer. You'll gain practical skills in data engineering, schema management, and building efficient pipelines with modern developer tools. Watch with full transcript & resources: https://motherduck.com/videos/duckdb-...

DuckDB & MotherDuck for Beginners: Your Ultimate Guide

Understanding DuckLake: A Table Format with a Modern Architecture

Data Analytics with Microsoft Fabric

Complete Data Engineering Course for Beginners (2025)

DuckDB & dbt | End-To-End Data Engineering Project (2/3)

Trying out DuckDB UI and getting completely BLOWN AWAY
![Hannes MuĢhleisen - Data Wrangling [for Python or R] Like a Boss With DuckDB](https://i.ytimg.com/vi/GELhdezYmP0/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLCaxQMHrq266vbSWFd0G7VJ9M9qUw)
Hannes MuĢhleisen - Data Wrangling [for Python or R] Like a Boss With DuckDB

99% of Developers Don't Get PostgreSQL

DuckDB vs Pandas vs Polars For Python devs

DuckDB and SQL - for Data Analysis and Processing

Introducing DuckLake

Realtime Data Streaming | End To End Data Engineering Project

DuckDB Tutorial For Beginners In 12 min

Can DuckDB revolutionize the data lake experience?

Why should you care about DuckDB? ft. Mihai Bojin

I replaced my entire stack with Postgres...

Starting With DuckDB and Python: An Introduction & Using DuckDB With Databases

DuckDBT: Not a database or a dbt adapter but a secret third thing ā DuckCon #3 (San Francisco)

Finally Trying MotherDuck (DuckDB Data Warehouse)ā¦I Did Not Expect This! š¦

