Real-World Dataset Cleaning with Python Pandas! (Olympic Athletes Dataset)

I'm prepping a dataset for an upcoming tutorial and I figured walking through the process of cleaning it would work well for a livestream! We use various Python Pandas functions to accomplish our data cleaning goals. We'll be working off of this repo: https://github.com/KeithGalli/Olympic... Some topics that we cover: How you can use web scraping to collect data like this (Python beautifulsoup). Splitting strings into separate columns Using regular expressions (regexes) to extract specific details from columns Converting columns to datetime & numeric types Grabbing only a subset of our columns Sorry that this was a bit last minute scheduling-wise, will try to give more advance notice in the future! Video timeline! 0:00 - Livestream Overview 4:00 - About the Olympics dataset (source website and how it was scraped) 9:50 - Cleaning the dataset (getting started with code & data) 19:26 - What aspects of our data should be cleaned? 29:08 - Get rid of bullet points in Used name column 34:08 - How to split Measurements into two separate height/weight numeric columns. 1:05:00 - Parse out dates from Born & Died columns 1:25:43 - Parse out city, region, and country from Born column (working with regular expressions) 1:41:15 - Get rid of the extra columns 1:46:08 - Next steps (how would we clean the results.csv) 1:49:41 - Questions & Answers ------------------------- Follow me on social media! Instagram |   / keithgalli   Twitter |   / keithgalli   TikTok |   / keithgalli   ------------------------- Practice your Python Pandas data science skills with problems on StrataScratch! https://stratascratch.com/?via=keith Join the Python Army to get access to perks! YouTube -    / @keithgalli   Patreon -   / keithgalli   *I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

Exploratory Data Analysis with Pandas Python
▶︎

Exploratory Data Analysis with Pandas Python

Build Awesome Web Apps & Dashboards with Python! (Full Shiny for Python Course)
▶︎

Build Awesome Web Apps & Dashboards with Python! (Full Shiny for Python Course)

생성형 AI 업무 적용의 어려움과 단계적 추진 사례 - KT/정기영 팀장
▶︎

생성형 AI 업무 적용의 어려움과 단계적 추진 사례 - KT/정기영 팀장

Comprehensive Analytics Reporting Tutorial with Python & Quarto!
▶︎

Comprehensive Analytics Reporting Tutorial with Python & Quarto!

Solving Leetcode Coding Interview Questions in Python!
▶︎

Solving Leetcode Coding Interview Questions in Python!

Data Cleaning(Beginner - Advanced): Practical Projects | Python Pandas Tutorial | Real World Dataset
▶︎

Data Cleaning(Beginner - Advanced): Practical Projects | Python Pandas Tutorial | Real World Dataset

Data Cleaning in Pandas | Python Pandas Tutorials
▶︎

Data Cleaning in Pandas | Python Pandas Tutorials

Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby)
▶︎

Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby)

Real-World Data Analysis & Visualization with Python! (Olympics Dataset Analysis)
▶︎

Real-World Data Analysis & Visualization with Python! (Olympics Dataset Analysis)

Solving 100 Python Pandas Problems! (from easy to very difficult)
▶︎

Solving 100 Python Pandas Problems! (from easy to very difficult)

Learn Pandas in Under 3 Hours | Filtering, Joins, Indexing, Data Cleaning, Visualizations
▶︎

Learn Pandas in Under 3 Hours | Filtering, Joins, Indexing, Data Cleaning, Visualizations

Solving Real-World Data Analysis Questions with Python! (Internet Usage Analysis)
▶︎

Solving Real-World Data Analysis Questions with Python! (Internet Usage Analysis)

Free Event: Power BI Beginner to Pro 2026 Edition - Full Hands-On Tutorial
▶︎

Free Event: Power BI Beginner to Pro 2026 Edition - Full Hands-On Tutorial

Complete Python Pandas Data Science Tutorial! (2025 Updated Edition)
▶︎

Complete Python Pandas Data Science Tutorial! (2025 Updated Edition)

Data Analysis with Python - Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn)
▶︎

Data Analysis with Python - Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn)

Watch me Cleaning Data in minutes with Python
▶︎

Watch me Cleaning Data in minutes with Python

Building an AI Dark Factory:  A Codebase That Writes Its Own Code, Live
▶︎

Building an AI Dark Factory: A Codebase That Writes Its Own Code, Live

Object Oriented Programming | OOPS in Python | OOPS Tutorial | Intellipaat
▶︎

Object Oriented Programming | OOPS in Python | OOPS Tutorial | Intellipaat

Learn Pandas in 30 Minutes - Python Pandas Tutorial
▶︎

Learn Pandas in 30 Minutes - Python Pandas Tutorial