L34- Advanced RAG Techniques | Agentic RAG, Self-RAG, CRAG, Re-Ranking & Multi-Query Explained

Lecture 34 of the AI for Software Engineers series — Bipin Kumar covers all the Advanced RAG techniques that take your RAG system from basic to production-grade. The highlight of this class is three self-improving RAG approaches — Agentic RAG, Self-RAG, and CRAG — each solving a different weakness in standard pipelines. 🧠 What's Covered: Re-Ranking with Cross Encoder: Standard cosine similarity retrieval is fast but approximate. Re-ranking adds a precise second step. A Cross Encoder model takes the query and one chunk together as a pair and outputs an exact relevance score between 0 and 1. Retrieve 20 chunks with cosine similarity first, then re-rank all 20 with the Cross Encoder, and take the top 5 from the re-ranked results. The final top 5 are significantly more accurate. Re-ranking should be in every production RAG system — it gives the biggest accuracy improvement for the least added complexity. Multi-Query Retrieval: A single phrasing of a query misses content that different wording would find. Multi-query sends the original question to an LLM which rewrites it 3 to 5 ways while keeping the same intent. Retrieval is run independently for each version. All results are merged and deduplicated. The final chunk set is much more comprehensive than any single query retrieval. Agentic RAG — RAG as a Tool: In standard RAG, retrieval always happens for every query whether it needs to or not. Agentic RAG wraps the entire RAG retrieval pipeline — chunking, embedding, searching the knowledge base — as a tool that an LLM agent can choose to call or skip. When a user query arrives, the agent reasons: does this question require searching the knowledge base, or can I answer it from general knowledge? If retrieval is needed, the agent calls the RAG tool and uses the result. If not, it answers directly. This is the natural integration of LangGraph workflows with RAG. Agentic RAG is best for complex multi-step queries where some steps need the knowledge base and some do not. Self-RAG — The LLM Reflects on Itself: Self-RAG adds three reflection checkpoints to the standard pipeline. Checkpoint 1: Should I even retrieve? The LLM judges whether the query needs external information or whether it can answer from its own training. Checkpoint 2: Is the retrieved chunk actually relevant? After retrieval, the LLM evaluates each chunk and rejects irrelevant ones before they pollute the context. Checkpoint 3: Is my generated answer grounded in the retrieved content? After generating, the LLM checks whether the answer is supported by what was retrieved or whether it has drifted into hallucination. Self-RAG is useful when your document collection is mixed in quality — some pages highly relevant, others tangential. CRAG — Corrective RAG: Standard RAG passes retrieved chunks to the LLM regardless of quality. If the chunks are irrelevant, the LLM generates a hallucinated but confident-sounding answer. CRAG adds an evaluator step. After retrieval, an evaluator LLM scores each chunk for relevance to the query. Chunks above the quality threshold are kept. Chunks below threshold are discarded and a fallback is triggered — typically a web search or a secondary knowledge source. The final answer is generated from the surviving high-quality chunks plus any fallback content. CRAG is best when your document source may have gaps or when the query might fall outside the knowledge base coverage. The Key Difference Between the 3: Agentic RAG decides WHEN to retrieve — makes retrieval conditional on the query type. Self-RAG decides IF retrieval is needed and validates output quality throughout the pipeline. CRAG validates WHAT was retrieved and replaces poor chunks before generation. Long-Term Memory: Standard chatbot memory resets each session. Long-term memory stores important user facts in a vector database. At the start of each new session, relevant stored facts are retrieved and injected into the system prompt. The user does not need to repeat their preferences, health conditions, or context across sessions. ⏭️ Next Lecture (Lecture 35): 👉 Agentic RAG hands-on with LangGraph — building the full workflow in code 💬 Questions about which approach to use first? Start with re-ranking, then add CRAG for reliability. Drop your questions in the comments! 📌 Subscribe so you never miss a class. #AdvancedRAG #AgenticRAG #SelfRAG #CRAG #ReRanking #MultiQuery #LangChain #AIforEngineers #BipinKumar #RAGInterview #GenAI #VectorDB #ProductionRAG #LangGraph #CrossEncoder

Using Large Language Models | Build Your Own LLM Workshop #1

Using Large Language Models | Build Your Own LLM Workshop #1

L33- RAG in Code | ChromaDB Setup, Retrieval, Generation, BM25 Keyword Retriever & Hybrid Ensemble

L33- RAG in Code | ChromaDB Setup, Retrieval, Generation, BM25 Keyword Retriever & Hybrid Ensemble

Karpathy's LLM Wiki - Full Beginner Setup Guide

Karpathy's LLM Wiki - Full Beginner Setup Guide

L25- Conditional Edges in LangGraph | Router Function, add_conditional_edges & Restaurant Classifier

L25- Conditional Edges in LangGraph | Router Function, add_conditional_edges & Restaurant Classifier

L30- RAG Coding: Document Loaders & Why Chunking Saves Crores | PyPDF, CSV, Docling & Cost Analysis

L30- RAG Coding: Document Loaders & Why Chunking Saves Crores | PyPDF, CSV, Docling & Cost Analysis

L27- Context Engineering in LangGraph | Trim Messages, Sliding Window & Chat Cost Optimisation

L27- Context Engineering in LangGraph | Trim Messages, Sliding Window & Chat Cost Optimisation

L20- How to Write Great AI Tools | Name, Docstring, Schema, Type Hints & Why Wrong Descriptions Fail

L20- How to Write Great AI Tools | Name, Docstring, Schema, Type Hints & Why Wrong Descriptions Fail

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed

How I Get Fable 5 Level Results with Any Model (Seriously) Using AI Harness Engineering

How I Get Fable 5 Level Results with Any Model (Seriously) Using AI Harness Engineering

Headroom: A Context Optimization Layer for LLM Applications - Tejas Chopra, Netflix, Inc.

Headroom: A Context Optimization Layer for LLM Applications - Tejas Chopra, Netflix, Inc.

L18- Agentic AI Introduction | GenAI vs Agents, Tools, Planning, Memory, Workflow & LangGraph

L18- Agentic AI Introduction | GenAI vs Agents, Tools, Planning, Memory, Workflow & LangGraph

L21- MCP — Model Context Protocol Explained | Host, Client, Server, STDIO vs HTTP-SSE & FastMCP

L21- MCP — Model Context Protocol Explained | Host, Client, Server, STDIO vs HTTP-SSE & FastMCP

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

The prompting playbook

The prompting playbook

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)

L16- Build Real AI Classifiers with Streamlit | Medical Doc Classifier + Logistics Doc Classifier

L16- Build Real AI Classifiers with Streamlit | Medical Doc Classifier + Logistics Doc Classifier

Why Inference is hard..

Why Inference is hard..

L23- LangGraph Introduction | State, Nodes, Edges, Compile & Build Your First Agentic Workflow

L23- LangGraph Introduction | State, Nodes, Edges, Compile & Build Your First Agentic Workflow

Feed Your OWN Documents to a Local Large Language Model!

Feed Your OWN Documents to a Local Large Language Model!

Claude Architect: Multi-Agent Orchestration

Claude Architect: Multi-Agent Orchestration