NLP Demystified 15: Transformers From Scratch + Pre-training and Transfer Learning With BERT/GPT
CORRECTION: 00:34:47: that should be "each a dimension of 12x4" Course playlist: • Natural Language Processing Demystified Transformers have revolutionized deep learning. In this module, we'll learn how they work in detail and build one from scratch. We'll then explore how to leverage state-of-the-art models for our projects through pre-training and transfer learning. We'll learn how to fine-tune models from Hugging Face and explore the capabilities of GPT from OpenAI. Along the way, we'll tackle a new task for this course: question answering. Colab notebook: https://colab.research.google.com/git... Timestamps 00:00:00 Transformers from scratch 00:01:05 Subword tokenization 00:04:27 Subword tokenization with byte-pair encoding (BPE) 00:06:53 The shortcomings of recurrent-based attention 00:07:55 How Self-Attention works 00:14:49 How Multi-Head Self-Attention works 00:17:52 The advantages of multi-head self-attention 00:18:20 Adding positional information 00:20:30 Adding a non-linear layer 00:22:02 Stacking encoder blocks 00:22:30 Dealing with side effects using layer normalization and skip connections 00:26:46 Input to the decoder block 00:27:11 Masked Multi-Head Self-Attention 00:29:38 The rest of the decoder block 00:30:39 [DEMO] Coding a Transformer from scratch 00:56:29 Transformer drawbacks 00:57:14 Pre-Training and Transfer Learning 00:59:36 The Transformer families 01:01:05 How BERT works 01:09:38 GPT: Language modelling at scale 01:15:13 [DEMO] Pre-training and transfer learning with Hugging Face and OpenAI 01:51:48 The Transformer is a "general-purpose differentiable computer" This video is part of Natural Language Processing Demystified --a free, accessible course on NLP. Visit https://www.nlpdemystified.org/ to learn more.

NLP Demystified 14: Machine Translation With Sequence-to-Sequence and Attention

Let's build GPT: from scratch, in code, spelled out.

AI + Automation Study Hall Live, n8n Workflows & Business AI
![Yann LeCun's $1B Bet Against LLMs [Part 1]](https://i.ytimg.com/vi/kYkIdXwW2AE/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDbV4izF3i-wxevCVIn7FJjoy1vlA)
Yann LeCun's $1B Bet Against LLMs [Part 1]

Attention in transformers, step-by-step | Deep Learning Chapter 6

Transformers | Build Your Own LLM Workshop #18

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Let's build the GPT Tokenizer

MIT Just Revealed the AI Bubble's Fatal Flaw

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Transformers, the tech behind LLMs | Deep Learning Chapter 5

What are Transformer Models and how do they work?

Attention Is All You Need - Paper Explained

Transformers, explained: Understand the model behind ChatGPT

How a Transformer works at inference vs training time

NLP Demystified 13: Recurrent Neural Networks and Language Models

A Hackers' Guide to Language Models

The math behind Attention: Keys, Queries, and Values matrices

Developing an LLM: Building, Training, Finetuning

