Keys, Queries, and Values: The celestial mechanics of attention
The attention mechanism is what makes Large Language Models like ChatGPT or DeepSeek talk well. But how does it work? One can see it as a mechanism that uses similarity to figure out what parts of the text to pay more or less attention to. For this, we use word embeddings. I like to see word embeddings as words flying around in the universe, like planets and stars. In this case, the attention mechanism (the Keys, Queries, and Values matrices) define the fabric of this universe, and the laws of gravity, that resemble (yet in some ways are very different) to the laws of gravity that rule our universe. Come join me in this celestial adventure in the universe of language! See other videos in this LLM series The attention mechanism in LLMs: • The Attention Mechanism in Large Language ... The math behind attention mechanisms: • The math behind Attention: Keys, Queries, ... Transformer models: • What are Transformer Models and how do the... Get the Grokking Machine Learning book! https://manning.com/books/grokking-ma... Discount code (40%): serranoyt (Use the discount code on checkout) 01:55 Similarity 02:12 Embeddings 04:56 Attention 07:14 Dot product 09:29 Cosine similarity 11:10 The Keys and Queries matrices 14:19 Compressing and stretching dimensions 18:50 Combining dimensions 23:14 Asymmetric pull 40:57 Multi-head attention 45:14 The Value matrix 49:24 Summary

The math behind Attention: Keys, Queries, and Values matrices

Attention in transformers, step-by-step | Deep Learning Chapter 6

Agents, RAG, and Reasoning Models

The 20-Minute Trick That Trains Your Brain to Absorb Any Language Fast Leonard Susskind Explains

Luis Serrano: Keys, Queries, and Values: The Celestial Mechanics of Attention

Retrieval Augmented Generation (RAG), Search, and Vector Databases

The Attention Mechanism in Large Language Models

Strengths and Weaknesses of Large Language Models

Terence Tao: Nobody Understands Why AI Actually Works

A Visual Guide to Attention Mechanisms in LLMs - Luis Serrano, Data Hack 2025 with @Analytics Vidhya

State Space Models (SSMs) and Mamba

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

A friendly introduction to Deep Learning and Neural Networks

Proximal Policy Optimization (PPO) - How to train Large Language Models

How to Think So Clearly People Assume You’re A Genius

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Rasa Algorithm Whiteboard - Transformers & Attention 2: Keys, Values, Queries

How might LLMs store facts | Deep Learning Chapter 7

