Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!

Transformers are taking over AI right now, and quite possibly their most famous use is in ChatGPT. ChatGPT uses a specific type of Transformer called a Decoder-Only Transformer, and this StatQuest shows you how they work, one step at a time. And at the end (at 32:14), we talk about the differences between a Normal Transformer and a Decoder-Only Transformer. BAM! NOTE: If you're interested in learning more about Backpropagation, check out these 'Quests: The Chain Rule: • The Chain Rule, Clearly Explained!!! Gradient Descent: • Gradient Descent, Step-by-Step Backpropagation Main Ideas: • Neural Networks Pt. 2: Backpropagation Mai... Backpropagation Details Part 1: • Backpropagation Details Pt. 1: Optimizing ... Backpropagation Details Part 2: • Backpropagation Details Pt. 2: Going bonke... If you're interested in learning more about the SoftMax function, check out: • Neural Networks Part 5: ArgMax and SoftMax If you're interested in learning more about Word Embedding, check out: • Word Embedding and Word2Vec, Clearly Expla... If you'd like to learn more about calculating similarities in the context of neural networks and the Dot Product, check out: Cosine Similarity: • Cosine Similarity, Clearly Explained!!! Attention: • Attention for Neural Networks, Clearly Exp... If you'd like to learn more about Normal Transformers, see: • Transformer Neural Networks, ChatGPT's fou... For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Patreon: / statquest ...or... YouTube Membership: / @statquest ...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: / joshuastarmer 0:00 Awesome song and introduction 1:34 Word Embedding 7:26 Position Encoding 10:10 Masked Self-Attention, an Autoregressive method 22:35 Residual Connections 23:00 Generating the next word in the prompt 26:23 Review of encoding and generating the prompt 27:20 Generating the output, Part 1 28:46 Masked Self-Attention while generating the output 30:40 Generating the output, Part 2 32:14 Normal Transformers vs Decoder-Only Transformers #StatQuest

Encoder-Only Transformers (like BERT) for RAG, Clearly Explained!!!

Encoder-Only Transformers (like BERT) for RAG, Clearly Explained!!!

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

The math behind Attention: Keys, Queries, and Values matrices

The math behind Attention: Keys, Queries, and Values matrices

The matrix math behind transformer neural networks, one step at a time!!!

The matrix math behind transformer neural networks, one step at a time!!!

AI Bubble vs Dot Com Crash. History is REPEATING

AI Bubble vs Dot Com Crash. History is REPEATING

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

Rich Sutton, The OaK Architecture: A Vision of SuperIntelligence from Experience - RLC 2025

Coding a ChatGPT Like Transformer From Scratch in PyTorch

Coding a ChatGPT Like Transformer From Scratch in PyTorch

The Strange Math That Predicts (Almost) Anything

The Strange Math That Predicts (Almost) Anything

Reinforcement Learning with Neural Networks: Essential Concepts

Reinforcement Learning with Neural Networks: Essential Concepts

StatQuest: Principal Component Analysis (PCA), Step-by-Step

StatQuest: Principal Component Analysis (PCA), Step-by-Step

Recurrent Neural Networks (RNNs), Clearly Explained!!!

Recurrent Neural Networks (RNNs), Clearly Explained!!!

But what is a convolution?

But what is a convolution?

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Don't learn AI Agents without Learning these Fundamentals

Don't learn AI Agents without Learning these Fundamentals

Attention for Neural Networks, Clearly Explained!!!

Attention for Neural Networks, Clearly Explained!!!

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning