CROSS-VALIDATION SKLEARN PYTHON (Techniques expliquées en Français)

In this French Python tutorial, I describe cross-validation techniques, which are very useful in machine learning, and show you how to implement them in Sklearn (Python). The main cross-validation techniques are: 1) KFold 2) Leave One Out 3) ShuffleSplit 4) StratifiedKFold 5) GroupKFold To use them in Python with Sklearn, you must import them from the sklearn.model_selection module. For example: from sklearn.model_selection import KFold cv = KFold(n_splits=5) cross_val_score(model, X, y, cv=cv) 1) KFold Cross-Validation: This involves shuffling the dataset, then dividing it into K equal parts (K-Fold). For example, if the dataset contains 100 samples and K=5, then we will have 5 sets of 20 samples. The machine then trains on 4 sets, then evaluates itself on the remaining set, and alternates between the different possible set combinations. Ultimately, it performs K training sessions (5 training sessions in this situation). This technique is widely used, but it has a slight disadvantage: if the dataset is heterogeneous and includes unbalanced classes, then some cross-validation splits may not contain minority classes. For example, if a dataset of 100 samples contains only 10 samples from class 0, and 90 samples from class 1, then it is possible that out of 5 folds, some may not contain any samples from class 0. 2) Leave One Out Cross Validation. This technique is a special case of K-Fold. In fact, this is the case where K = "number of samples in the dataset." For example, if a dataset contains 100 samples, then K = 100. The machine therefore trains on 99 samples and evaluates itself on the last one. It thus performs 100 training sessions (out of the 100 possible combinations), which can take the machine a considerable amount of time. This technique is NOT RECOMMENDED. 3) ShuffleSplit Cross-Validation: This technique consists of shuffling and then splitting the dataset into two parts: a training part and a test part. Once the training and evaluation are complete, we gather our data, reshuffle it, and then re-split the dataset in the same proportions as before. We repeat this action for as many cross-validation iterations as desired. This allows us to find the same data multiple times in the validation set across the iterations. This technique is a GOOD ALTERNATIVE to K-FOLD, but it has the same disadvantage: if the classes are unbalanced, then we risk missing information in the validation set! 4) STRATIFIED K-FOLD This technique is the default choice (but consumes slightly more resources than K-FOLD). It involves shuffling the dataset, then letting the machine sort the data into "Strata" (i.e., into different classes) before forming K groups (K-Folds), each containing a small amount of data from each Strata (each Class). 5) GROUP K-FOLD This cross-validation technique is VERY IMPORTANT TO KNOW! In data science, we often assume that our data are independent and drawn from the same distribution. For example, the apartments in a real estate dataset are all independent (from each other) and identically distributed. But this isn't always the case! For example, data in a medical dataset can be interdependent: if people in the same family are diagnosed with cancer, then the genetic factor creates a dependency between the different data. It's therefore necessary to divide the dataset into influence groups, which is why GROUP K-FOLD exists. GroupKfold(5).split(X, y, groups) ► MY WEBSITE IN ADDITION TO THIS VIDEO: https://machinelearnia.com/ ► JOIN OUR DISCORD COMMUNITY / discord ► Get my free book: LEARN MACHINE LEARNING IN ONE WEEK CLICK HERE: https://machinelearnia.com/apprendre-... ► Download my code for free on GitHub: https://github.com/MachineLearnia ► Subscribe: / @machinelearnia ► To learn more: Visit Machine Learnia: https://machinelearnia.com/

REGRESSION METRICS in DATA SCIENCE (Coefficient of Determination, Squared Error, etc.)

REGRESSION METRICS in DATA SCIENCE (Coefficient of Determination, Squared Error, etc.)

Entretien Dev Python en Live : Démontrer ses compétences techniques

Entretien Dev Python en Live : Démontrer ses compétences techniques

A Comprehensive Guide to Cross-Validation with Scikit-Learn and Python

A Comprehensive Guide to Cross-Validation with Scikit-Learn and Python

[Leçon inaugurale] Yann Le Cun - Apprentissage profond et au-delà : les nouveaux défis de l'IA

[Leçon inaugurale] Yann Le Cun - Apprentissage profond et au-delà : les nouveaux défis de l'IA

Machine Learning en trading : les pièges que personne ne montre

Machine Learning en trading : les pièges que personne ne montre

PYTHON SKLEARN - MODEL SELECTION : Train_test_split, Cross Validation, GridSearchCV (21/30)

PYTHON SKLEARN - MODEL SELECTION : Train_test_split, Cross Validation, GridSearchCV (21/30)

PYTHON SKLEARN PRE-PROCESSING + PIPELINE (22/30)

PYTHON SKLEARN PRE-PROCESSING + PIPELINE (22/30)

6. Monte Carlo Simulation

6. Monte Carlo Simulation

PYTHON SKLEARN: KNN, LinearRegression et SUPERVISED LEARNING (20/30)

PYTHON SKLEARN: KNN, LinearRegression et SUPERVISED LEARNING (20/30)

Why mathematicians are worried about the future of their discipline

Why mathematicians are worried about the future of their discipline

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

Listen and Feel the Peace | Tibetan Healing Sounds for Deep Meditation, Inner Peace & Soul Healing

Listen and Feel the Peace | Tibetan Healing Sounds for Deep Meditation, Inner Peace & Soul Healing

Machine Learning Tutorial Python 12 - K Fold Cross Validation

Machine Learning Tutorial Python 12 - K Fold Cross Validation

APPRENTISSAGE NON-SUPERVISÉ avec Python (24/30)

APPRENTISSAGE NON-SUPERVISÉ avec Python (24/30)

Overfitting et Underfitting en Machine Learning

Overfitting et Underfitting en Machine Learning

USA – Paraguay Highlights | Gruppe D, FIFA WM 2026 | sportstudio

USA – Paraguay Highlights | Gruppe D, FIFA WM 2026 | sportstudio

Machine Learning for Everybody – Full Course

Machine Learning for Everybody – Full Course

PYTHON NUMPY machine learning (10/30)

PYTHON NUMPY machine learning (10/30)