Anna Veronika Dorogush: Mastering gradient boosting with CatBoost | PyData London 2019

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. This tutorial will explain details of using gradient boosting in practice, we will solve a classification problem using the popular GBDT library CatBoost. www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 0:00 - Introduction 1:49 - Intro to CatBoost 2:08 - Overview of the Presentation 2:39 - Intro to Gradient Boosting 6:08 - Numerical and Categorical Data with CatBoost 7:26 - Advantages of CatBoost 9:00 - Library Comparison (Quality) 9:45 - Speed 10:11 - Benchmarking (CPU & GPU) 11:55 - CPU vs GPU 12:50 - Prediction Time 13:24 - Tutorial 15:15 - Problem Statement 15:38 - CatBoost Library (Imports and related issues) 16:22 - Reading and Intro to the Data 18:17 - Exploring the data 19:36 - Training the Model with default parameters 22:16 - Creating the Pool Object 23:12 - Splitting the data (Train & Validation) 24:16 - Selecting the objective function 25:11 - STDOUT of training 28:32 - Plotting metrics while training 30:33 - Model Comparison (plotting after training) 32:39 - Finding the best model 35:05 - Cross-Validation 41:30 - Grid Search 44:40 - Overfitting Detector 49:18 - Overfitting Detector with eval metric 51:31 - Model Predictions 57:10 - Select Decision Boundary 1:01:04 - Model Evaluation (new dataset) 1:03:06 - Feature Importance 1:03:37 - Prediction Values Change 1:04:50 - Loss Function Change 1:07:49 - Shap Values 1:16:05 - Snapshotting 1:17:45 - Saving the Model 1:18:36 - Hyperparameters Tuning 1:23:07 - Speeding up Training and Reducing Model Size 1:23:35 - Additional Details about CatBoost Community 1:25:50 - Future Scope of CatBoost 1:26:22 - Questions and Suggestions S/o to https://github.com/theProcrastinatr for the video timestamps! Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Dr. Egor Kraev: Choose the right neural generative model for your problem | PyData London 2019
▶︎

Dr. Egor Kraev: Choose the right neural generative model for your problem | PyData London 2019

Kevin Lemagnen: Maintainable code in data science | PyData London 2019
▶︎

Kevin Lemagnen: Maintainable code in data science | PyData London 2019

Chris Fonnesbeck: An introduction to Markov Chain Monte Carlo using PyMC3  | PyData London 2019
▶︎

Chris Fonnesbeck: An introduction to Markov Chain Monte Carlo using PyMC3 | PyData London 2019

Anna Veronika Dorogush - CatBoost - the new generation of Gradient Boosting
▶︎

Anna Veronika Dorogush - CatBoost - the new generation of Gradient Boosting

Vincent Warmerdam: How to Constrain Artificial Stupidity | PyData London 2019
▶︎

Vincent Warmerdam: How to Constrain Artificial Stupidity | PyData London 2019

Chris Fonnesbeck - Flexible Statistical Modeling | Pydata London 26
▶︎

Chris Fonnesbeck - Flexible Statistical Modeling | Pydata London 26

Web Scraping Using Python For Beginners and File Handling in Python | Python Web Scraping
▶︎

Web Scraping Using Python For Beginners and File Handling in Python | Python Web Scraping

TabPFN: The Swift Foundation Model for Tabular Data
▶︎

TabPFN: The Swift Foundation Model for Tabular Data

Igor Gotlibovych: Deep Learning and Time Series Forecasting for Smarter Energy | PyData London 2019
▶︎

Igor Gotlibovych: Deep Learning and Time Series Forecasting for Smarter Energy | PyData London 2019

Niek Tax - Practical Multicalibration with MCGrad | Pydata London 26
▶︎

Niek Tax - Practical Multicalibration with MCGrad | Pydata London 26

Explainable AI for Science and Medicine
▶︎

Explainable AI for Science and Medicine

Maarten Breddels & Jovan Veljanoski- A new approach to DataFrames and pipelines - PyData London 2019
▶︎

Maarten Breddels & Jovan Veljanoski- A new approach to DataFrames and pipelines - PyData London 2019

Gianluca Campanella: The unreasonable effectiveness of feature hashing | PyData London 2019
▶︎

Gianluca Campanella: The unreasonable effectiveness of feature hashing | PyData London 2019

Accelerate, Collide, Detect: Gravitational Waves & Particle Physics with Brian Greene & Barry Barish
▶︎

Accelerate, Collide, Detect: Gravitational Waves & Particle Physics with Brian Greene & Barry Barish

Microsoft Fabric and Power BI - Developer of the Future⚡ [Full Course]
▶︎

Microsoft Fabric and Power BI - Developer of the Future⚡ [Full Course]

Kaggle Winning Solution Xgboost Algorithm - Learn from Its Author, Tong He
▶︎

Kaggle Winning Solution Xgboost Algorithm - Learn from Its Author, Tong He

Being polite - soften your language: Live English Class
▶︎

Being polite - soften your language: Live English Class

Raoul-Gabriel Urma, Kevin Lemagnen: Adv. Software Testing for Data Scientists | PyData London 2019
▶︎

Raoul-Gabriel Urma, Kevin Lemagnen: Adv. Software Testing for Data Scientists | PyData London 2019

Brian Greene and Leonard Susskind: Quantum Mechanics, Black Holes and String Theory
▶︎

Brian Greene and Leonard Susskind: Quantum Mechanics, Black Holes and String Theory

Jeffrey Hsu, Susannah Klanecek: A Deep Dive into NLP with PyTorch | PyData London 2019
▶︎

Jeffrey Hsu, Susannah Klanecek: A Deep Dive into NLP with PyTorch | PyData London 2019