How to write a fast Softmax kernel
Support this channel at: https://buymeacoffee.com/simonoz Code for animations: https://github.com/SzymonOzog/GPU_Pro... Code for kernels and benchmarks: https://github.com/SzymonOzog/FastSof... References: https://arxiv.org/pdf/1805.02867 https://github.com/karpathy/llm.c/blo... https://siboehm.com/articles/22/CUDA-MMM https://github.com/facebookincubator/... Programming Masively Parallel Processors book

▶︎
Give Me 30 min, I'll Make CUDA Click Forever

▶︎
Every Level of Reverse Engineering Explained

▶︎
How Agents Quietly Break Architecture

▶︎
Coding a Triton Kernel for Softmax (fwd pass) Computation

▶︎
GPU Architecture Explained – Massively Parallel Computing for Scientists

▶︎
What Are Neural Networks Even Doing? (Manifold Hypothesis)

▶︎
Google Just Killed Websites. It's Not Good.

▶︎
We're 99.9% sure this pattern is true, but no one can prove it

▶︎
Zig 2026: No-AI Policy, $670K Foundation, Left GitHub & Why Zig Isn’t 1.0 - Andrew Kelley Explains

▶︎
Watch this if everything feels too much (gentle comfort for tired women)

▶︎
3 Hours of Creepy Minecraft Theories to Fall Asleep to

▶︎
CUDA Crash Course: Sum Reduction Part 1

▶︎
How Divergence and Curl Were Discovered

▶︎
God Says:"I JUST CONFIRMED — ONLY YOU CAN SEE THIS LETTER"/God Message Now/God Message

▶︎
NVIDIA Tensor Cores Programming

▶︎
One man just liberated Fable... and now it’s illegal

▶︎
How to Actually Learn C (2027 Edition)

▶︎
How FlashAttention Accelerates Generative AI Revolution
![The Dark Matter of AI [Mechanistic Interpretability]](https://i.ytimg.com/vi/UGO_Ehywuxc/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBkSvGfku9uu1v4EkqTxrcfZ6YBMA)
▶︎
The Dark Matter of AI [Mechanistic Interpretability]

▶︎
