Build Real ML Model That Predicts Taxi Tips with XGBoost and NVIDIA GPU 🧠⚔

Ever wondered what makes people tip more in taxis? šŸš•šŸ’µ In this hands-on machine learning project, we’ll build a complete workflow on real-world NYC data — cleaned, engineered, and trained entirely on GPU using XGBoost CUDA and cuDF Pandas! 🐼 (🚨No GPU?🚨 I’ll show you how to use one for free on Google Colab! šŸ˜‰) You’ll see how professionals approach problems, handle massive data, and fix memory errors - designing real data-science pipelines step by step! šŸ˜Ž By the end, you’ll have a meaningful project that’s fun to build, technically impressive, and looks perfect on your portfolio!! 🤩 Join me on this adventure — and learn how to think like a pro-level data scientist. šŸ’” What You’ll Learn Handling Real-World Datasets: Cleanup, Missing Values, Anomalies, Aggregation. šŸ“Š Solving memory limitations and runtime crushes with cuDF Pandas + RMM. šŸ’¾ Accelerating machine learning with XGBoost on NVIDIA GPUs. šŸ¤– Evaluate your model’s performance — and keep making it smarter! šŸ’ŖšŸ¤“ And most importantly — develop the mindset of a data scientist, solving problems instead of guessing. šŸ”Ž 🧠 What Makes This Project Different This isn’t another ā€œbeginner demoā€ — it’s a real workflow based on real data and real problems. You’ll experience the same challenges professionals face: huge sloppy datasets, missing labels, CPU and GPU memory limits — all explained step by step, in simple terms. I’ll show you why we make each decision, not just how to code it — so you learn to think, debug, and reason like a pro. šŸ”— Important Links ------------------------------------------------ šŸ”¹Download Tutorial Code and Smaller Dataset from GitHub: https://github.com/MariyaSha/nyc_taxi... šŸ”¹ Download Full Dataset from NYC Open Data: https://data.cityofnewyork.us/Transpo... šŸ”¹RAPIDS Installation Guide: https://docs.rapids.ai/install/ šŸ”¹Official NVIDIA Google Colab Notebook - 🧐 VERY ADVANCED 🧐: https://colab.research.google.com/dri... šŸ“½ļø Important Tutorials ------------------------------------------------ ⭐ WSL + Conda Setup:    • MyĀ Go-ToĀ PythonĀ Setup!Ā šŸĀ WSLĀ +Ā CondaĀ Minif...Ā Ā  ⭐ Machine Learning with Scikit-Learn:    • SimpleĀ MachineĀ LearningĀ CodeĀ TutorialĀ forĀ ...Ā Ā  ⭐ cuDF Pandas For Beginners:    • MuchĀ FasterĀ PandasĀ withĀ cuDFĀ GPUĀ Processin...Ā Ā  ⭐ What is CUDA?    • CUDAĀ SimplyĀ ExplainedĀ -Ā GPUĀ vsĀ CPUĀ Paralle...Ā Ā  ā° Time Stamps ------------------------------------------------ 01:08 - Download Dataset 01:43 - Solving Big Data Problems with GPU Processing 02:46 - Google Colab Setup with Free T4 GPU 03:02 - Local Setup with NVIDIA GPU 03:43 - RAPIDS Installation Guide 05:07 - Solving Jupyter Kernel Crash with cuDF Pandas 05:29 - Handling Missing Values 05:53 - Detect Missing Values 06:29 - Replace with Zero 07:31 - Replace with Mean 08:57 - Investigate Columns with Ambiguous Names 11:21 - Drop Columns (If No Other Option) 12:01 - Split Data For Training & Testing 12:07 - Shuffle Data 13:39 - Features & Targets Split 14:02 - Train & Test Split 16:20 - Load XGBoost Model on GPU 17:55 - Train XGBoost Model 18:08 - Test XGBoost Model and Get Predictions 18:45 - Solve ValueError : DataFrame.dtypes must be int float bool or category 20:15 - Evaluate Trained Model 22:39 - Data Optimization & Anomalies 22:41 - Detect Data Anomalies with Aggregation 23:47 - Solve XGBoostError : No GPU Memory Left with RMM 25:04 - Handle Negative Charges and Unrealistic Distances 28:19 - Detect and Handle Unrealistic Transactions 30:28 - Second Train Run on Optimized Data 31:45 - Best Practices 31:45 - Plot Training Results & Feature Importance 32:17 - Hyperparameter Tuning 32:49 - Date Extraction : From String to Int or Category 33:05 - K-Fold Validation 33:45 - Thanks for Watching! šŸš€ Environment Setup ------------------------------------------------ You can run this project in two ways, coding along with me: 1ļøāƒ£ Google Colab: Change your runtime to T4 GPU. Use smaller version of the NYC Taxi dataset (5 million rows). Download above šŸ‘† 2ļøāƒ£ Local setup: Make sure you have a CUDA compatible GPU. Use WSL and Minforge/Conda (āš ļøMUST! āš ļø). Use current command from RAPIDS Installation Guide for your setup (āš ļøMUST! āš ļø). Use the full version of the NYC Taxi dataset (38 million rows). Download above šŸ‘† šŸ’» Tutorial Code ------------------------------------------------ šŸ“Œ Remove all the rows that have negative numbers: data = data[~data.select_dtypes("number").lt(0).any(axis=1)] šŸ“Œ Solve "XGBoostError: No GPU memory is left" and kernel crashes: import rmm rmm.reinitialize(pool_allocator=True, initial_pool_size="8GB") #MachineLearning #DataScience #Python #BigData #GPU #NVIDIA #RAPIDS #DataAnalysis #DataCleaning #PythonTutorial #AI #pythonprogramming

How Nvidia GPUs Compare To Google’s And Amazon’s AI Chips
ā–¶ļøŽ

How Nvidia GPUs Compare To Google’s And Amazon’s AI Chips

NVIDIA CEO Jensen Huang's Vision for the Future
ā–¶ļøŽ

NVIDIA CEO Jensen Huang's Vision for the Future

AI + Automation Study Hall Live, n8n Workflows & Business AI
ā–¶ļøŽ

AI + Automation Study Hall Live, n8n Workflows & Business AI

How to Build & Sell AI Agents: Ultimate Beginner’s Guide
ā–¶ļøŽ

How to Build & Sell AI Agents: Ultimate Beginner’s Guide

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup
ā–¶ļøŽ

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Using Large Language Models | Build Your Own LLM Workshop #1
ā–¶ļøŽ

Using Large Language Models | Build Your Own LLM Workshop #1

I Tested the Cheapest Path to 96GB of VRAM
ā–¶ļøŽ

I Tested the Cheapest Path to 96GB of VRAM

Large Language Models explained briefly
ā–¶ļøŽ

Large Language Models explained briefly

Gaster, Creator of Deltarune (Analysis/Theory)
ā–¶ļøŽ

Gaster, Creator of Deltarune (Analysis/Theory)

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!
ā–¶ļøŽ

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

XGBoost for Multi-Class Classification with Python | Step-by-Step with Hyperparameter Tuning
ā–¶ļøŽ

XGBoost for Multi-Class Classification with Python | Step-by-Step with Hyperparameter Tuning

What to teach when AI writes the code | Rainer Stropek | TEDxLinz
ā–¶ļøŽ

What to teach when AI writes the code | Rainer Stropek | TEDxLinz

Turn Any LLM Into an Expert šŸ“š RAG Coding Crash Course
ā–¶ļøŽ

Turn Any LLM Into an Expert šŸ“š RAG Coding Crash Course

I Built an AI Trading System With Claude + TradingView
ā–¶ļøŽ

I Built an AI Trading System With Claude + TradingView

Accelerated Machine Learning with XGBoost on NVIDIA GPUs | Accelerated Data Science Series
ā–¶ļøŽ

Accelerated Machine Learning with XGBoost on NVIDIA GPUs | Accelerated Data Science Series

Instant Focus Mode – 40Hz Gamma Brainwave Music for Deep Focus & Productivity
ā–¶ļøŽ

Instant Focus Mode – 40Hz Gamma Brainwave Music for Deep Focus & Productivity

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit
ā–¶ļøŽ

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
ā–¶ļøŽ

Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption

PINK & ORANGE GRADIENT IN HD [3 HOURS]
ā–¶ļøŽ

PINK & ORANGE GRADIENT IN HD [3 HOURS]

Learn To See What God Sees When He Looks At You
ā–¶ļøŽ

Learn To See What God Sees When He Looks At You