Fuzzy String Matching in R | Overview and R Tutorial (Using fuzzywuzzy, polyfuzz, and difflib)
In today's video, we'll learn about fuzzy string matching (also known as approximate string matching) and how to perform it in R. A common use case for fuzzy string matching is when we want to join two datasets. Perhaps these datasets have a variable in common, but the information in one dataset is expressed slightly differently than the information in the other (e.g., “Amazon” vs. “Amazon.com, Inc”). How can we determine if these two variables are referring to the same thing? We can use fuzzy string matching, a popular Natural Language Processing (NLP) technique! We'll start with a conceptual overview of fuzzy string matching, and then look at some examples in R using several different algorithms. We’ll use fuzzywuzzy, polyfuzz, and difflib – currently the most popular packages for performing this task. Among others, some of the string matching algorithms that are implemented in these packages include Levenshtein Distance (sometimes called "Edit Distance") and Gestalt Pattern Matching (sometimes called "Ratcliff/Obershelp Pattern Matching"). The code, slides, and dataset used in this video can be found here: https://github.com/melissavanbussel/Y... The dataset originated from Kaggle: https://www.kaggle.com/code/leandrodo... The blog post about PolyFuzz referenced in the video is located here: https://towardsdatascience.com/string... If you like this video, please subscribe to my channel so that I can continue to make content like this! 😊 0:00 - Overview of fuzzy string matching 3:49 - Fuzzy string matching in R 9:53 - Using the difflib package 16:32 - Using the fuzzywuzzy package 19:58 - Using the polyfuzz package

Exploring NLP Fuzzy Matching Algorithms

Data Manipulation Tools: dplyr -- Pt 3 Intro to the Grammar of Data Manipulation with R

How Fuzzy Text Search Works

Fuzzy Matching in R (Example) | Approximate String, Name & Text Search | adist(), agrep() & amatch()

How to create a custom classification model in R using the openai package (fine-tune tutorial)

How to work with APIs using R (httr2 package tutorial)

Fuzzy String Matching in Python

Teaching the tidyverse in 2023 | Mine Çetinkaya-Rundel

Fuzzy String Matching in Natural Language Processing | NLP

How SpaceX Humiliated Wall Street

Writing Your Own Functions in R: Introduction

Hadley Wickham: Managing many models with R

What is Different about Quarto? (Relative to R Markdown)

Clean your data with R. R programming for beginners.

Intro to the Tidyverse

Data wrangling with R in 27 minutes

Modeling hotel bookings in R using tidymodels and recipes

Something is jamming GPS over Europe. Here's what we found

Tom Mock | A Gentle Introduction to Tidy Statistics in R | RStudio (2019)

