Mini Project: How to program a GPU? | CUDA C/C++

Matrix multiplication on a GPU using CUDA C/C++. Code Repository: https://github.com/tgautam03/xGeMM Video Notes: https://tgautam03.github.io/2024/11/2... https://tgautam03.github.io/2024/11/2... https://tgautam03.github.io/2024/11/2... https://tgautam03.github.io/2024/11/2... https://tgautam03.github.io/2024/11/2... https://tgautam03.github.io/2024/11/2... https://tgautam03.github.io/2024/11/2... Animations: https://github.com/tgautam03/0Mean1Si... Other Projects: https://0mean1sigma.com/mini-projects/ Useful References: https://siboehm.com/articles/22/CUDA-MMM https://leimao.github.io/article/CUDA... Chapters: 00:00 - Introduction 00:36 - Step 1 (Basic CUDA C/C++) 03:02 - Step 2 (Memory Coalescing) 05:57 - Step 3 (GPU Shared Memory) 06:57 - Step 4 (Thread Registers) 09:18 - Step 5 (More Thread Registers) 10:43 - Step 6 (Vectorized Memory Accesses) 12:02 - Final Thoughts