🔗 Byte Pair Encoding (BPE) – Live Coding with Sebastian Raschka (Chapter 2.5)

Check out Sebastian Raschka's book 📖 Build a Large Language Model (From Scratch) | https://hubs.la/Q03l0mSf0 📖 Dive into one of the most powerful subword tokenization techniques in NLP! In this live-coding tutorial, LLM expert ‪@SebastianRaschka‬ walks through Chapter 2.5: Byte Pair Encoding from his book Build a Large Language Model (From Scratch). Learn how BPE builds an efficient vocabulary by iteratively merging the most frequent character pairs—striking the perfect balance between vocabulary size and representational power. 0:00 - Introduction to Byte Pair Encoding (BPE) 0:30 - Overcoming Tokenizer Shortcomings 1:58 - Practical Demonstration of BPE in Action 3:50 - Additional Resources on BPE 5:50 - Integration with Tiktoken Library 8:56 - Utilizing GPT-2 Tokenizer 10:30 - Handling Special End-of-Text Tokens 12:37 - Conclusion 📘 About the Book Build a Large Language Model (From Scratch) is a practical and eminently-satisfying hands-on journey into the foundations of generative AI. Without relying on any existing LLM libraries, you’ll code a base model, evolve it into a text classifier, and ultimately create a chatbot that can follow your conversational instructions. And you’ll really understand it because you built it yourself! 🔗 Get the Book: https://hubs.la/Q03l0mSf0 🔔 Subscribe for more deep-dive ML tutorials, live chapter walkthroughs, and expert insights from Manning Publications. #SebastianRaschka #BytePairEncoding #BPE #Tokenization #NLP #MachineLearning #DeepLearning #Transformers #PyTorch #ManningPublications #LiveCoding