L33- RAG in Code | ChromaDB Setup, Retrieval, Generation, BM25 Keyword Retriever & Hybrid Ensemble

Lecture 33 of the AI for Software Engineers series — Bipin Kumar writes the complete RAG pipeline in code end-to-end: document loading, chunking, ChromaDB vector store, similarity retrieval, LLM generation, BM25 keyword retrieval, and a hybrid ensemble combining both. 🧠 What's Covered: The 4 RAG Setup Steps (One-Time Only): These steps are done once before any user can ask questions. Step 1 — load the document (9-page PDF used in the demo). Step 2 — chunk using Recursive Text Splitter with 800 tokens per chunk and 20% overlap. Step 3 — embed each chunk using OpenAI text-embedding-3-small (1536 dimensions). Step 4 — store all embeddings in ChromaDB with a collection name and persist directory. After these 4 steps, 27 chunks from the 9-page PDF are stored in the vector database and ready to serve queries. ChromaDB — How It Works: from_documents() takes three inputs: the list of chunks, the embedding model, and the collection name plus persist directory. It automatically handles embedding calculation and storage in one call. The folder appears locally on disk. For development and POC projects with under 1000 pages, ChromaDB is the standard choice. For production with large document sets, use AWS OpenSearch Serverless, Azure CosmosDB, Qdrant, Milvus, or MongoDB. Retrieval — Getting Relevant Chunks: Define a retriever with search type (similarity) and K (number of chunks to return). Pass the query and receive exactly K document objects. Each has page_content and metadata. Importantly, retrieval copies chunks — the original documents in the vector DB are never deleted or modified. Live demo: asking about a specific government scheme returned 3 chunks, the most relevant one exactly matching the relevant section of the PDF. Generation — Producing the Final Answer: Combine the page_content from all K retrieved chunks into a single context string. Insert this into the system message. Pass system message, retrieved context, and user query together to the LLM. The LLM generates a grounded answer based on the provided content. This combined input is called augmentation — giving the LLM the right context it needs to answer accurately without hallucinating. BM25 Keyword Retriever: A separate retriever that uses keyword matching instead of semantic similarity. Based on TF-IDF principles from classical NLP. Rare words are given higher importance — a word like "Vikas" that appears only a few times in the document gets more weight than common words like "the" or "is." Common words are largely ignored. Limitation: it is case-sensitive in some implementations and cannot find meaning-based matches — "agriculture" and "farming" are treated as completely unrelated. Ensemble Retriever — Combining Both: LangChain's EnsembleRetriever takes the BM25 retriever and the semantic retriever together with a list of weights. Both retrieve K documents independently. Their results are merged using the RRF (Reciprocal Rank Fusion) formula to produce a ranked final list. Starting weight recommendation: 50/50, then adjust based on your application. Financial or legal documents where exact terminology matters should give more weight to BM25. ⏭️ Next Lecture (Lecture 34): 👉 Advanced RAG — Re-ranking, Multi-Query, HyDE, CRAG, and GraphRAG 💬 Questions about ChromaDB or the ensemble setup? Drop them in the comments — Bipin replies! 📌 Subscribe so you never miss a class. #RAG #ChromaDB #BM25 #HybridRetrieval #EnsembleRetriever #LangChain #VectorDB #AIforEngineers #BipinKumar #SemanticSearch #KeywordSearch #GenAI #RAGCoding #RetrievalAugmented #AIInterview

Karpathy's LLM Wiki - Full Beginner Setup Guide

Karpathy's LLM Wiki - Full Beginner Setup Guide

What World Class Software Engineers Do That You Don't

What World Class Software Engineers Do That You Don't

L32- Full RAG Pipeline in Code | Hybrid Search, RRF, MMR, ChromaDB, Generation & RAG Evaluation

L32- Full RAG Pipeline in Code | Hybrid Search, RRF, MMR, ChromaDB, Generation & RAG Evaluation

L25- Conditional Edges in LangGraph | Router Function, add_conditional_edges & Restaurant Classifier

L25- Conditional Edges in LangGraph | Router Function, add_conditional_edges & Restaurant Classifier

What Is an AI Agent? Complete Beginner Guide to AI Agents, Memory & Tools | Day 11

What Is an AI Agent? Complete Beginner Guide to AI Agents, Memory & Tools | Day 11

L1-AI Evolution for Software Engineers | Lecture 1 — From Rule-Based to Agentic AI

L1-AI Evolution for Software Engineers | Lecture 1 — From Rule-Based to Agentic AI

Is RAG Still Needed? Choosing the Best Approach for LLMs

Is RAG Still Needed? Choosing the Best Approach for LLMs

L31- Embeddings & Chunking Strategies in RAG | Recursive, Parent-Child, Semantic, & Agentic Chunking

L31- Embeddings & Chunking Strategies in RAG | Recursive, Parent-Child, Semantic, & Agentic Chunking

System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

Designing Data-Intensive Applications: Chapters 1 and 2

Designing Data-Intensive Applications: Chapters 1 and 2

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

Abstract Black and White wave pattern| Height Map Footage| 3 hours Topographic 4k Background

Abstract Black and White wave pattern| Height Map Footage| 3 hours Topographic 4k Background

How AI agents & Claude skills work (Clearly Explained)

How AI agents & Claude skills work (Clearly Explained)

L23- LangGraph Introduction | State, Nodes, Edges, Compile & Build Your First Agentic Workflow

L23- LangGraph Introduction | State, Nodes, Edges, Compile & Build Your First Agentic Workflow

RAG Crash Course for Beginners

RAG Crash Course for Beginners

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

40Hz Binaural Gamma Waves - Ultra Deep Concentration

40Hz Binaural Gamma Waves - Ultra Deep Concentration

L27- Context Engineering in LangGraph | Trim Messages, Sliding Window & Chat Cost Optimisation

L27- Context Engineering in LangGraph | Trim Messages, Sliding Window & Chat Cost Optimisation

L29- RAG Explained | Retrieval Augmented Generation, Embeddings, Cosine Similarity & Semantic Search

L29- RAG Explained | Retrieval Augmented Generation, Embeddings, Cosine Similarity & Semantic Search