Comparing duckdb and duckplyr to tibbles, data.tables, and data.frames (CC279)
duckdb has quickly grown in popularity as a database platform that is super fast with large datasets. Watch as Pat shows how to generate a duckdb database and access values from the database. He'll also compare the performance of using duckdb directly and using duckplyr or using tibbles, data.tables, and data.frames. Pat will discuss how the perforance changes by the number of different key values and the size of the database. You'll likely be surprised by the results! This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier. If you want to get a physical copy of R Packages: https://amzn.to/43pMR8L If you want a free, online version of R packages: https://r-pkgs.org/ You can find my blog post for this episode at https://www.riffomonas.org/code_club/.... Check out the GitHub repository at the: Beginning of the episode: https://github.com/riffomonas/phyloty... End of the episode: https://github.com/riffomonas/phyloty... #rstats #microbenchmark #vectors #rdp #16S #classification #classifier #microbialecology #microbiome Support Riffomonas by becoming a Patreon member! / riffomonas Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at https://shop.riffomonas.org/youtube to get practice problems, tips, and insights. If you're interested in purchasing a video workshop be sure to check out https://riffomonas.org/workshops/ You can also find complete tutorials for learning R with the tidyverse using... Microbial ecology data: https://www.riffomonas.org/minimalR/ General data: https://www.riffomonas.org/generalR/ 0:00 Introduction 6:07 Improve construction of data.table objects 11:12 Performance of which vs. logical 16:04 Improved access to values in data.table objects 20:31 Using duckdb() to store and access data 27:11 Using duckplyr() to store and access data 30:05 Evaluating sensitivity to number of rows and sparsity 32:11 Improving performance of sparse matrix construction

Renaming our R package, updating RStudio and R, organizing code, and passing Check! (CC280)

Development of a example R package (CC266)

The NoSQL Lie That Keeps Developers Overbuilding

The magrittr and base R pipe: what's the difference? (CC241)

DuckDBT: Not a database or a dbt adapter but a secret third thing – DuckCon #3 (San Francisco)

Introducing DuckLake

Three approaches to organize your R project (CC178)

DuckDB Co-Creator Hannes Mühleisen on Why Single-Node Beats Distributed

Benchmarking R functions for joining data frames (CC292)

How to purrr

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Using dplyr's group_by function with and without summarize (CC233)

Master R Data Cleaning: dplyr vs data.table
![Hannes Mühleisen - Data Wrangling [for Python or R] Like a Boss With DuckDB](https://i.ytimg.com/vi/GELhdezYmP0/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLCaxQMHrq266vbSWFd0G7VJ9M9qUw)
Hannes Mühleisen - Data Wrangling [for Python or R] Like a Boss With DuckDB

HOLY ROSARY TODAY THURSDAY, JUNE 11, 2026 ST. JUDE THADDEUS & LUMINOUS MYSTERIES | DAILY HOLY ROSARY

DuckDB: Crunching Data Anywhere, From Laptops to Servers • Gabor Szarnyas • GOTO 2024

Why use DuckDB in your data pipelines ft. Niels Claeys

How to interpret (and assess!) a GLM in R

3 Reasons to Use Tidymodels with Julia Silge

