Build Real ML Model That Predicts Taxi Tips with XGBoost and NVIDIA GPU š§ ā”
Ever wondered what makes people tip more in taxis? ššµ In this hands-on machine learning project, weāll build a complete workflow on real-world NYC data ā cleaned, engineered, and trained entirely on GPU using XGBoost CUDA and cuDF Pandas! š¼ (šØNo GPU?šØ Iāll show you how to use one for free on Google Colab! š) Youāll see how professionals approach problems, handle massive data, and fix memory errors - designing real data-science pipelines step by step! š By the end, youāll have a meaningful project thatās fun to build, technically impressive, and looks perfect on your portfolio!! 𤩠Join me on this adventure ā and learn how to think like a pro-level data scientist. š” What Youāll Learn Handling Real-World Datasets: Cleanup, Missing Values, Anomalies, Aggregation. š Solving memory limitations and runtime crushes with cuDF Pandas + RMM. š¾ Accelerating machine learning with XGBoost on NVIDIA GPUs. š¤ Evaluate your modelās performance ā and keep making it smarter! šŖš¤ And most importantly ā develop the mindset of a data scientist, solving problems instead of guessing. š š§ What Makes This Project Different This isnāt another ābeginner demoā ā itās a real workflow based on real data and real problems. Youāll experience the same challenges professionals face: huge sloppy datasets, missing labels, CPU and GPU memory limits ā all explained step by step, in simple terms. Iāll show you why we make each decision, not just how to code it ā so you learn to think, debug, and reason like a pro. š Important Links ------------------------------------------------ š¹Download Tutorial Code and Smaller Dataset from GitHub: https://github.com/MariyaSha/nyc_taxi... š¹ Download Full Dataset from NYC Open Data: https://data.cityofnewyork.us/Transpo... š¹RAPIDS Installation Guide: https://docs.rapids.ai/install/ š¹Official NVIDIA Google Colab Notebook - š§ VERY ADVANCED š§: https://colab.research.google.com/dri... š½ļø Important Tutorials ------------------------------------------------ ā WSL + Conda Setup: Ā Ā Ā ā¢Ā MyĀ Go-ToĀ PythonĀ Setup!Ā šĀ WSLĀ +Ā CondaĀ Minif...Ā Ā ā Machine Learning with Scikit-Learn: Ā Ā Ā ā¢Ā SimpleĀ MachineĀ LearningĀ CodeĀ TutorialĀ forĀ ...Ā Ā ā cuDF Pandas For Beginners: Ā Ā Ā ā¢Ā MuchĀ FasterĀ PandasĀ withĀ cuDFĀ GPUĀ Processin...Ā Ā ā What is CUDA? Ā Ā Ā ā¢Ā CUDAĀ SimplyĀ ExplainedĀ -Ā GPUĀ vsĀ CPUĀ Paralle...Ā Ā ā° Time Stamps ------------------------------------------------ 01:08 - Download Dataset 01:43 - Solving Big Data Problems with GPU Processing 02:46 - Google Colab Setup with Free T4 GPU 03:02 - Local Setup with NVIDIA GPU 03:43 - RAPIDS Installation Guide 05:07 - Solving Jupyter Kernel Crash with cuDF Pandas 05:29 - Handling Missing Values 05:53 - Detect Missing Values 06:29 - Replace with Zero 07:31 - Replace with Mean 08:57 - Investigate Columns with Ambiguous Names 11:21 - Drop Columns (If No Other Option) 12:01 - Split Data For Training & Testing 12:07 - Shuffle Data 13:39 - Features & Targets Split 14:02 - Train & Test Split 16:20 - Load XGBoost Model on GPU 17:55 - Train XGBoost Model 18:08 - Test XGBoost Model and Get Predictions 18:45 - Solve ValueError : DataFrame.dtypes must be int float bool or category 20:15 - Evaluate Trained Model 22:39 - Data Optimization & Anomalies 22:41 - Detect Data Anomalies with Aggregation 23:47 - Solve XGBoostError : No GPU Memory Left with RMM 25:04 - Handle Negative Charges and Unrealistic Distances 28:19 - Detect and Handle Unrealistic Transactions 30:28 - Second Train Run on Optimized Data 31:45 - Best Practices 31:45 - Plot Training Results & Feature Importance 32:17 - Hyperparameter Tuning 32:49 - Date Extraction : From String to Int or Category 33:05 - K-Fold Validation 33:45 - Thanks for Watching! š Environment Setup ------------------------------------------------ You can run this project in two ways, coding along with me: 1ļøā£ Google Colab: Change your runtime to T4 GPU. Use smaller version of the NYC Taxi dataset (5 million rows). Download above š 2ļøā£ Local setup: Make sure you have a CUDA compatible GPU. Use WSL and Minforge/Conda (ā ļøMUST! ā ļø). Use current command from RAPIDS Installation Guide for your setup (ā ļøMUST! ā ļø). Use the full version of the NYC Taxi dataset (38 million rows). Download above š š» Tutorial Code ------------------------------------------------ š Remove all the rows that have negative numbers: data = data[~data.select_dtypes("number").lt(0).any(axis=1)] š Solve "XGBoostError: No GPU memory is left" and kernel crashes: import rmm rmm.reinitialize(pool_allocator=True, initial_pool_size="8GB") #MachineLearning #DataScience #Python #BigData #GPU #NVIDIA #RAPIDS #DataAnalysis #DataCleaning #PythonTutorial #AI #pythonprogramming

How Nvidia GPUs Compare To Googleās And Amazonās AI Chips

NVIDIA CEO Jensen Huang's Vision for the Future

AI + Automation Study Hall Live, n8n Workflows & Business AI

How to Build & Sell AI Agents: Ultimate Beginnerās Guide

Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

Using Large Language Models | Build Your Own LLM Workshop #1

I Tested the Cheapest Path to 96GB of VRAM

Large Language Models explained briefly

Gaster, Creator of Deltarune (Analysis/Theory)

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

XGBoost for Multi-Class Classification with Python | Step-by-Step with Hyperparameter Tuning

What to teach when AI writes the code | Rainer Stropek | TEDxLinz

Turn Any LLM Into an Expert š RAG Coding Crash Course

I Built an AI Trading System With Claude + TradingView

Accelerated Machine Learning with XGBoost on NVIDIA GPUs | Accelerated Data Science Series

Instant Focus Mode ā 40Hz Gamma Brainwave Music for Deep Focus & Productivity

Inside Anthropic, the $965 Billion AI Juggernaut | The Circuit

Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
![PINK & ORANGE GRADIENT IN HD [3 HOURS]](https://i.ytimg.com/vi/6ih8zppfQSQ/hqdefault.jpg?sqp=-oaymwE9CNACELwBSFryq4qpAy8IARUAAAAAGAElAADIQj0AgKJDeAHwAQH4Af4JgALQBYoCDAgAEAEYfyAsKBMwDw==&rs=AOn4CLDvw6mQM98bfl572zfE7r4GdUG8dg)
PINK & ORANGE GRADIENT IN HD [3 HOURS]

