RAG is dead, right?? — Kuba Rogut, Turbopuffer
Cursor added semantic search and measured a 24% increase in answer accuracy on their composer model, a 2.6% gain in code retention in large codebases, and a 2.2% drop in dissatisfied user requests. Those numbers look small until you factor in that semantic search does not fire on every query. Meanwhile Google search volume for RAG hit a new inflection point in mid 2025 and went through the roof. The Twitter "RAG is dead" discourse and the actual usage curve are moving in opposite directions. Kuba Rogut's argument is that the problem was never retrieval, it was the narrow definition of it. RAG is not just a vector search call. It is vector search, full text search, glob, regex, and filters used iteratively by an agent that keeps searching until it has what it needs. He contrasts Claude Code (grep per session, no index, repeat cost every run) with Cursor (one time upfront indexing, lightweight tool calls at runtime). Claude Code's approach is not wrong, it is a deliberate tradeoff. The frame that clarifies it: embeddings are cached compute, and whether to cache depends on query volume. Jeff Dean's version: you do not need a trillion tokens at once, you need the right million. Speaker info: / kubarogut https://x.com/rogutkuba Timestamps: 0:00 Introduction to the "RAG is dead" discourse 1:12 Google search volume trends for RAG 1:39 Defining RAG vs. Agentic Search 3:15 Cursor's indexing and semantic search approach 6:10 Contrasting Claude Code (grep) vs. Cursor (indexed) 6:40 The concept of embeddings as cached compute 8:38 The shift from simple RAG to Agentic Retrieval 9:44 Jeff Dean on context windows and stage retrieval

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

How Hackers Trick AI Models (Prompt Injection Explained)

Is RAG Still Needed? Choosing the Best Approach for LLMs

I Was Right About AI

Agentic RAG & LLMOps - How Observability helps (LangGraph & Opik)

Text Diffusion — Brendan O’Donoghue, Google DeepMind

Big Tech Engineer in Beijing → 30, Rated “Unqualified”, Living Alone. How Did I Get Here?

How we solved Context Management in Agents — Sally-Ann Delucia

What is happening at Meta?

The data black hole at the center of AI

Why The Best Software Engineers Are Solving Code Review Bottlenecks Now
![Yann LeCun's $1B Bet Against LLMs [Part 1]](https://i.ytimg.com/vi/kYkIdXwW2AE/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDbV4izF3i-wxevCVIn7FJjoy1vlA)
Yann LeCun's $1B Bet Against LLMs [Part 1]

Here we go again...

I am done with Golang

Your Attention Is the Bottleneck, Not Your Agents — Zack Proser, WorkOS

Full Walkthrough: Workflow for AI Coding — Matt Pocock

Stop Prompting Claude. Use Karpathy's Method Instead.

I Think They Are Lying To You

