Parallel Histogram computation on GPUs in CUDA
In this video we discuss parallel histogram computation, which is a commonly used parallel programming pattern. We discuss a GPU implementation in CUDA, observe bottlenecks and iteratively optimise to get the best implementation.

▶︎
Parallel histogram computation on GPUs in CUDA (part 2)

▶︎
From Scratch: Histograms in CUDA using Atomics

▶︎
Stencil computation pattern in GPU programming CUDA

▶︎
Give Me 30 min, I'll Make CUDA Click Forever

▶︎
What Nobody Tells You About Being a Quant

▶︎
Parallel sum reduction on GPUs in CUDA

▶︎
Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

▶︎
Only Dangerously Smart People Think Like This

▶︎
Simple Code, High Performance

▶︎
Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

▶︎
ASMR Best Triggers For Sleep Collection (No Talking) 3 Hours of Tapping & Scratching

▶︎
No Celebrity Has ZERO Filter Like Harrison Ford _ and It’s HILARIOUS!

▶︎
China Isn't Catching Up—15 Inventions Proving They've Already Won

▶︎
The Strange Math That Predicts (Almost) Anything

▶︎
6. Monte Carlo Simulation

▶︎
JANITOR vs THE BIGGEST GUYS IN THE GYM. They Didn’t Expect THAT

▶︎
Why Aliens Would NEVER Invade Africa

▶︎
When Celebrities Couldn’t Handle Sacha Baron Cohen’s ZERO Filter (Borat, Ali G, The Dictator)

▶︎
