Parallel merge algorithm on GPUs using CUDA
Given two sorted arrays A, B, we want to merge these two to form a resultant sorted array C. We formulate a parallel merging algorithm in CUDA for GPUs. 1) Algorithm 1: using non-coalesced accesses to global memory 2) Algorithm 2: using shared memory to reduce this 3) Algorithm 3: reducing shared memory requirement using a circular buffer

▶︎
Magnus Teaches the London System (to every Elo)

▶︎
Bitonic Sort - Sorting Algorithms Mini-Series (Episode 9)

▶︎
Lecture 16: Warp Scheduling and Divergence

▶︎
4.5x Faster CUDA C with just Two Variable Changes || Episode 3: Memory Coalescing

▶︎
Parallel sum reduction on GPUs in CUDA

▶︎
COMP526 3-7 §3.6 Parallel primitives, Prefix sum

▶︎
What Nobody Tells You About Being a Quant

▶︎
Reinventing Entropy | Compression is Intelligence Part 1

▶︎
Creator of C++: Bell Labs, Negative Overhead Abstraction, Mistakes | Bjarne Stroustrup

▶︎
Stencil computation pattern in GPU programming CUDA

▶︎
System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

▶︎
CUDA Explained - Why Deep Learning uses GPUs

▶︎
LLM Inference Deep Dive: TensortRT-LLM, KV Cache, Prefill vs Decode, TTFT, TPOT | NVIDIA NCP-GENL

▶︎
Build a Complete Medical Chatbot with LLMs, LangChain, Pinecone, Flask & AWS 🔥

▶︎
GOD SAYS;- IT’S TIME I FINALLY TELL YOU THE TRUTH.. | GOD'S MESSAGE FOR YOU TODAY

▶︎
6. Monte Carlo Simulation

▶︎
AstroGPU - CUDA Data Parallel Algorithms - Mark Harris

▶︎
Don't Hang Up On AI Scammers. Do THIS Instead.

▶︎
The Strange Math That Predicts (Almost) Anything

▶︎
