The Paradox of Self-Evolving AI

Everywhere you look right now - recursive self-improvement, meta-harnesses, auto-research, AlphaEvolve - the headlines all point at one idea: an AI that can build a better version of AI. So I spent some time reading the papers, the announcements, and the arguments on every side, trying to figure out where we actually are with self-evolving AI. This is my honest attempt at the whole picture: what's real, what works, where every version of the loop quietly breaks - and the strange question it leaves us with. No hype, no doom. Just what the research actually says. ━━━━━━━━━━━━━━━━━━━━━━ CHAPTERS ━━━━━━━━━━━━━━━━━━━━━━ 0:00 The one question 1:16 Google's AlphaEvolve 4:03 The race to build it (OpenAI's 2028 goal) 5:30 How would a machine improve itself? 7:36 The wall: model collapse 8:49 Looking in the wrong place: the harness 12:05 Can a machine design its own harness? 13:48 The Darwin Gödel Machine 14:25 The dream learns to cheat 17:55 Terence Tao's verdict 20:02 The last question ━━━━━━━━━━━━━━━━━━━━━━ SOURCES & FURTHER READING ━━━━━━━━━━━━━━━━━━━━━━ (In the order they appear.) AlphaEvolve (Google DeepMind) • Blog: https://deepmind.google/blog/alphaevo... • Paper: https://arxiv.org/abs/2506.13242 — Sped up a matrix-multiplication kernel in Gemini's training (~23% on that kernel, ~1% of total training time); found a 4×4 complex matrix-multiply in 48 scalar multiplications, improving on Strassen's 1969 result (49). Anthropic — the model helping build the next model • "When AI builds itself": https://www.anthropic.com/institute/r... — Claude now authors 80%+ of merged code (leadership estimates 90%+); engineers merge ~8× more code/day than in 2024. OpenAI — the automated AI researcher • Sam Altman on a "legitimate AI researcher by 2028": https://techcrunch.com/2025/10/28/sam... • Jakub Pachocki, "a whole research lab in a data center": https://www.technologyreview.com/2026... Path one — a model training itself • Andrej Karpathy, "autoresearch": https://github.com/karpathy/autoresearch • STaR: Bootstrapping Reasoning With Reasoning — https://arxiv.org/abs/2203.14465 • DeepSeek-R1 (AIME 15.6% → 71.0%, pure RL): https://arxiv.org/abs/2501.12948 · https://www.nature.com/articles/s4158... • Self-Rewarding Language Models: https://arxiv.org/abs/2401.10020 The wall — model collapse • Shumailov et al., Nature 2024: https://www.nature.com/articles/s4158... • "Is Model Collapse Inevitable?" (accumulating real + synthetic data avoids it): https://arxiv.org/abs/2404.01413 The harness — model · agent · harness • ADAS / Automated Design of Agentic Systems (Hu, Lu, Clune): https://arxiv.org/abs/2408.08435 • Self-improving harness (score 40 → 62): https://arxiv.org/abs/2606.09498 • The Darwin Gödel Machine (Sakana AI) — SWE-bench 20% → 50%, and the reward-hacking / deleted-safeguard finding: https://sakana.ai/dgm/ When the loop learns to cheat • Goodhart's law / Marilyn Strathern's 1997 phrasing: https://en.wikipedia.org/wiki/Goodhar... • The Great Hanoi Rat Massacre (1902, bounty → rat farming): https://en.wikipedia.org/wiki/Great_H... • Scalable oversight — Concrete Problems in AI Safety: https://arxiv.org/abs/1606.06565 · Measuring Progress on Scalable Oversight: https://arxiv.org/abs/2211.03540 • Terence Tao et al., "Mathematical exploration and discovery at scale" (AlphaEvolve on 67 problems; gaming the verifier): https://arxiv.org/abs/2511.02864 · Tao's write-up: https://terrytao.wordpress.com/2025/1... The reward-hacking zoo • CoastRunners boat (OpenAI, "Faulty Reward Functions in the Wild"): https://openai.com/index/faulty-rewar... • Tetris — "the only winning move is not to play" (Tom Murphy VII, SIGBOVIK 2013): https://www.cs.cmu.edu/~tom7/mario/ • Chess engines editing the board file to force a resign (Palisade Research): https://github.com/PalisadeResearch/c... Open-endedness — the way out? • Kenneth Stanley & Joel Lehman, "Why Greatness Cannot Be Planned: The Myth of the Objective": https://link.springer.com/book/10.100... • POET (Paired Open-Ended Trailblazer): https://arxiv.org/abs/1901.01753 · Enhanced POET: https://arxiv.org/abs/2003.08536 • OMNI (Open-endedness via Models of human Notions of Interestingness): https://arxiv.org/abs/2306.01711 #AI #ArtificialIntelligence #SelfImprovingAI #AGI #MachineLearning #AlphaEvolve #AIsafety