Relative Self-Attention Explained
In this video, we dive into a very interesting topic "Relative Self-Attention". First, we will see the differences between relative and absolute position embedding, and then we will cover two algorithms for incorporating relative embedding in self-attention. #transformers #deeplearning

▶︎
FlashAttention: Accelerate LLM training

▶︎
Rotary Positional Embeddings: Combining Absolute and Relative

▶︎
Lecture 8: Swin Transformer from Scratch in PyTorch - Relative Positional Embedding

▶︎
MLP and ML Metrics

▶︎
Efficient Self-Attention for Transformers

▶︎
Attention in transformers, step-by-step | Deep Learning Chapter 6

▶︎
Self Attention with torch.nn.MultiheadAttention Module

▶︎
A Dive Into Multihead Attention, Self-Attention and Cross-Attention
![How Rotary Position Embedding Supercharges Modern LLMs [RoPE]](https://i.ytimg.com/vi/SMBkImDWOyQ/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLB6gWS_ZRO-UhithwlfNKgGNDFVNQ)
▶︎
How Rotary Position Embedding Supercharges Modern LLMs [RoPE]

▶︎
Relative Position Bias (+ PyTorch Implementation)

▶︎
Hidden Markov Model : Data Science Concepts

▶︎
The math behind Attention: Keys, Queries, and Values matrices

▶︎
Markov Chain Monte Carlo Explained in 10 Minutes

▶︎
The Fisher Information

▶︎
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

▶︎
The Strange Math That Predicts (Almost) Anything

▶︎
Rasa Algorithm Whiteboard - Transformers & Attention 3: Multi Head Attention

▶︎
Mathe-News! Durchbruch beim Kürzeste-Wege-Problem

▶︎
Adding vs. concatenating positional embeddings & Learned positional encodings

▶︎
