Rotary Positional Embeddings: Combining Absolute and Relative

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io In this video, I explain RoPE - Rotary Positional Embeddings. Proposed in 2022, this innovation is swiftly making its way into prominent language models like Google's PaLM and Meta's LLaMa. I unpack the magic behind rotary embeddings and reveal how they combine the strengths of both absolute and relative positional encodings. 0:00 - Introduction 1:22 - Absolute positional embeddings 3:19 - Relative positional embeddings 5:51 - Rotary positional embeddings 7:56 - Matrix formulation 9:31 - Implementation 10:38 - Experiments and conclusion References: RoFormer: Enhanced Transformer with Rotary Position Embedding (main paper that proposes RoPE embeddings): https://arxiv.org/abs/2104.09864 EleutherAI blog post: https://blog.eleuther.ai/rotary-embed... Blog posts by first author Jianlin Su (in Chinese): https://kexue.fm/archives/8130 and https://kexue.fm/archives/8265 Survey paper on positional embeddings: https://aclanthology.org/2022.cl-3.7/

Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

The KV Cache: Memory Usage in Transformers

The KV Cache: Memory Usage in Transformers

Why Rotating Vectors Solves Positional Encoding in Transformers | Rotary Positional Embeddings(ROPE)

Why Rotating Vectors Solves Positional Encoding in Transformers | Rotary Positional Embeddings(ROPE)

Speculative Decoding: When Two LLMs are Faster than One

Speculative Decoding: When Two LLMs are Faster than One

Give me 30 min, I will make RoPE click forever

Give me 30 min, I will make RoPE click forever

Transformer Positional Embeddings With A Numerical Example

Transformer Positional Embeddings With A Numerical Example

Positional Encoding in Transformer | Sinusoidal Positional Encoding Explained

Positional Encoding in Transformer | Sinusoidal Positional Encoding Explained

Rotary Positional Embeddings Explained | Transformer

Rotary Positional Embeddings Explained | Transformer

LLMs Don't Need More Parameters. They Need Loops.

LLMs Don't Need More Parameters. They Need Loops.

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

Stanford XCS224U: NLU I Contextual Word Representations, Part 3: Positional Encoding I Spring 2023

Stanford XCS224U: NLU I Contextual Word Representations, Part 3: Positional Encoding I Spring 2023

Relative Position Bias (+ PyTorch Implementation)

Relative Position Bias (+ PyTorch Implementation)

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]

How Rotary Position Embedding Supercharges Modern LLMs [RoPE]

How Rotary Position Embedding Supercharges Modern LLMs [RoPE]

Rotary Position Embedding explained deeply (w/ code)

Rotary Position Embedding explained deeply (w/ code)

Relative Self-Attention Explained

Relative Self-Attention Explained

Rotary Positional Encodings | Explained Visually

Rotary Positional Encodings | Explained Visually

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

How do Transformer Models keep track of the order of words? Positional Encoding

How do Transformer Models keep track of the order of words? Positional Encoding

How Attention Got So Efficient [GQA/MLA/DSA]

How Attention Got So Efficient [GQA/MLA/DSA]