How Rotary Position Embedding Supercharges Modern LLMs [RoPE]
Positional information is critical in transformers' understanding of sequences and their ability to generalize beyond training context length. In this video, we discuss 1) Why attention mechanism in transformers is not sufficient 2) Earlier attempt for injecting positional information (e.g., sinusoidal positional encoding) 3) Rotary position embedding, and 4) Techniques for long-context generalization and extension. Background on Transformer: • But What Are Transformers? References: [Transformer] Attention Is All You Need https://arxiv.org/abs/1706.03762 [RoPE] RoFormer: Enhanced Transformer with Rotary Position Embedding https://arxiv.org/abs/2104.09864 [How is RoPE useful?] Round and Round We Go! What makes Rotary Positional Encodings useful? https://arxiv.org/abs/2410.06205 [Controlled study] A Controlled Study on Long Context Extension and Generalization in LLMs https://arxiv.org/abs/2409.12181 Raw PowerPoint slides: https://www.dropbox.com/scl/fi/y43aw2...

Mixture of Experts (MoE), Visually Explained

KV Cache: The Invisible Trick Behind Every LLM

3- Positional Encoding In Transformer

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

Why Rotating Vectors Solves Positional Encoding in Transformers | Rotary Positional Embeddings(ROPE)
![How Attention Got So Efficient [GQA/MLA/DSA]](https://i.ytimg.com/vi/Y-o545eYjXM/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBuOQf8Rw0rEDbSy5MucgJ2Vh6xGw)
How Attention Got So Efficient [GQA/MLA/DSA]

Transformer Positional Embeddings With A Numerical Example

Give me 30 min, I will make RoPE click forever

Rotary Positional Embeddings: Combining Absolute and Relative

Rotary Positional Encodings | Explained Visually

LLM Quantization Explained: GPTQ, AWQ, QLoRA, GGUF and More

Rotary Position Embedding explained deeply (w/ code)

The End of Frozen LLMs? (Google’s Hope Explained)

Yann LeCun's $1B Bet Against LLMs

Rotary Positional Embeddings Explained | Transformer

Attention in transformers, step-by-step | Deep Learning Chapter 6

Positional Encoding in Transformer | Sinusoidal Positional Encoding Explained

Stanford XCS224U: NLU I Contextual Word Representations, Part 3: Positional Encoding I Spring 2023

I Visualised Attention in Transformers

