S3-E7 · Where Does AI's Knowledge Actually Come From? (Training Data)

You have heard that an AI read the whole internet and that is where its knowledge comes from. It feels true, and it is wrong in two different ways. This lecture is about the real product behind every model, its data, and the quiet drama of 2026: we are running low on good human text and arguing over what is fair to train on. You will understand how a filthy firehose of scraped web pages gets filtered into a clean stream (curation beats raw size), what the data wall actually is, why synthetic data is the field's answer and when it causes model collapse, how scaling laws split your compute budget, and the unresolved copyright fight underneath it all. By the end you will see a model as a compressed echo of exactly what it was fed. Full course playlist:    • How AI Works · Season 3: Under the Hood   New lecture every week. Subscribe to @HowAIWorksHQ to understand how AI really works, one clear idea at a time.