Google Indexed You. AI Has No Idea You Exist. | Stephen Burns, Common Crawl

Your site is probably being blocked from AI training data right now. 35% of the internet is - most of them by accident, because of a CDN default setting nobody checked. Stephen Burns is the Web Intelligence Lead at Common Crawl Foundation, the nonprofit whose data feeds the training sets behind GPT, Claude, Llama, and most major LLMs. In this episode of Odys Podcast: The High Stakes Growth Show, he breaks down why AI visibility and Google rankings are now two separate problems - and why the operators who treat them as the same are losing ground they don't know they've lost. He covers: 🟣 what Common Crawl actually crawls (2.3 billion pages per month, one or two URLs per domain), 🟣 why harmonic centrality matters more than PageRank for getting into AI training data, 🟣 what the EU AI Act's August 2nd disclosure deadline means for verifying your own visibility, 🟣 and why a top children's hospital in Los Angeles is invisible in ChatGPT answers about leukemia treatment - because their CDN is blocking AI crawlers by default, and nobody in their IT department knows it. Chapters: 00:00:00 AI Crawler Blocking: Why 35% of Websites Are Invisible to ChatGPT 00:01:28 What Is Common Crawl? How AI Models Get Training Data 00:02:30 Harmonic Centrality vs PageRank for AI SEO 00:03:39 Why Non-English Websites Need English Pages for AI Search 00:05:43 Why ChatGPT Gives Wrong Answers From Outdated Content 00:06:48 How AI-Generated Content Impacts AI Search Rankings 00:07:41 New Domains vs Authority Sites in AI Search 00:08:22 How to Check Harmonic Centrality for Free 00:10:50 How SEOs Track ChatGPT and AI Search Rankings 00:11:08 robots.txt Settings That Block AI Crawlers 00:12:37 How CDN Settings Accidentally Block ChatGPT Crawlers 00:15:02 GEO Checklist for AI Search Visibility 00:17:21 Why JavaScript Hurts AI Search Visibility 00:18:13 SEO vs AIO: Train & Retrieve vs Index & Rank 00:20:26 How to Check if AI Models Trained on Your Website 00:22:24 The Future of AI Search and Web Agents 🟣 The Guest: Stephen Burns has led web intelligence at Common Crawl Foundation since 2022. Common Crawl's dataset has been cited in over 10,000 peer-reviewed research papers and underpins the training data for most large language models in production today. 🟣 Odys Podcast: The High Stakes Growth Show We host conversations with founders, operators, investors, engineers, and technologists building and scaling businesses across AI, crypto, IoT, blockchain infrastructure, digital identity, cybersecurity, machine economies, and emerging technologies. Expect operator-level insights, real-world infrastructure discussions, growth strategies, and candid conversations from people building where the technological and economic stakes are highest. 🔔 Subscribe for more conversations on: seo organic traffic affiliate marketing brand signals Google ranking factors technical seo and how to win in difficult verticals! #aisearch #aiseo #llmseo #crawler #chatgptmarketing #generativeengineoptimization #seoforbusiness #technicalseo #robotstxt #aioptimization #aidiscovery #odyspodcast #seopodcast