L27- Context Engineering in LangGraph | Trim Messages, Sliding Window & Chat Cost Optimisation

Lecture 27 of the AI for Software Engineers series — Bipin Kumar solves the hidden cost problem that grows in every chatbot. The longer a session runs, the more tokens are sent with every new message. Context Engineering is how you control this — and it is a topic interviewers now ask about directly. 🧠 What's Covered: The Cost Problem with Long Chat History: When a user reaches question 10 in a session, the LLM receives all 9 previous questions and answers along with the new one — to maintain conversation continuity. With 5 tokens per question and 20 tokens per answer, even a modest 10-turn chat costs around 250 tokens for a single new question. As sessions grow longer, this cost compounds significantly. Context Engineering — What It Is: Prompt engineering is about how to write one message well. Context engineering is about how to manage all the messages being passed to the LLM — keeping only what is needed, in the right format, without losing important context. The goal in both is the same: accurate answers at lower cost. Context Window Rule: GPT's context window is 128,000 tokens. Best practice is to use a maximum of 10% of this — around 12,800 tokens. Exceeding this does not cause errors, but the model starts losing focus on earlier parts of the input and answer quality drops. Trim Messages — Two Modes: Trim by count — set a maximum number of messages to keep. The most recent N messages are retained. Older ones are deleted. In a live demo with 30 messages, keeping 6 plus the system message worked correctly: H1, A1, H2, A2 were removed and only the 6 most recent exchanges were passed to the LLM. Trim by token — set a maximum token budget. The LLM itself counts the tokens and removes messages from oldest to newest until the budget is met. With a 120-token limit on 307 total tokens, only 4 messages were retained. The SBI Card Problem — Why Trim Alone Is Risky: In question 1, a user asked about an SBI Elite credit card. They continued chatting about other topics until question 10, when they said "I want to apply for that card." If the trim window does not include question 1, the chatbot has no idea which card is being referred to. Trimming too aggressively loses critical early context. The Combo Approach — Best Practice: Keep the last 5–6 messages (recent turns for continuity). For everything older, generate a 2–3 line summary and pass that as well. The LLM gets the recent conversation in full plus a compressed memory of what came before. This balances cost and accuracy. Middleware: Any processing that happens between the start of the workflow and the LLM call is called middleware. Trim functions, summarisation nodes, validation logic — all of these are middleware. In LangGraph, a middleware is simply an extra node inserted before the LLM node in the workflow. ⏭️ Next Lecture (Lecture 28): 👉 Summarisation strategy in practice + Human-in-the-Loop (interrupt and resume) 💬 Questions about context engineering or trim strategies? Drop them in the comments — Bipin replies! 📌 Subscribe so you never miss a class. #ContextEngineering #TrimMessages #LangGraph #LangChain #AgenticAI #ChatMemory #TokenCost #AIforEngineers #BipinKumar #Middleware #SlidingWindow #GenerativeAI #Python #LLMOptimisation #AIInterview

L18- Agentic AI Introduction | GenAI vs Agents, Tools, Planning, Memory, Workflow & LangGraph

L18- Agentic AI Introduction | GenAI vs Agents, Tools, Planning, Memory, Workflow & LangGraph

L25- Conditional Edges in LangGraph | Router Function, add_conditional_edges & Restaurant Classifier

L25- Conditional Edges in LangGraph | Router Function, add_conditional_edges & Restaurant Classifier

Snowflake SQL Joins Real Time Use Cases - Snowflake Joins and Types

Snowflake SQL Joins Real Time Use Cases - Snowflake Joins and Types

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

L30- RAG Coding: Document Loaders & Why Chunking Saves Crores | PyPDF, CSV, Docling & Cost Analysis

L30- RAG Coding: Document Loaders & Why Chunking Saves Crores | PyPDF, CSV, Docling & Cost Analysis

L33- RAG in Code | ChromaDB Setup, Retrieval, Generation, BM25 Keyword Retriever & Hybrid Ensemble

L33- RAG in Code | ChromaDB Setup, Retrieval, Generation, BM25 Keyword Retriever & Hybrid Ensemble

God Says:"TAKE THIS MESSAGE SERIOUSLY, BECAUSE ONLY YOU ARE SEEING IT"/God Message Now/God Message

God Says:"TAKE THIS MESSAGE SERIOUSLY, BECAUSE ONLY YOU ARE SEEING IT"/God Message Now/God Message

Abstract Black and White wave pattern| Height Map Footage| 3 hours Topographic 4k Background

Abstract Black and White wave pattern| Height Map Footage| 3 hours Topographic 4k Background

Moody Gardens Penguin Cam LIVE | Penguin Habitat Stream at the Aquarium in Galveston, Texas

Moody Gardens Penguin Cam LIVE | Penguin Habitat Stream at the Aquarium in Galveston, Texas

The French Do Not Care About Work

The French Do Not Care About Work

40Hz Binaural Gamma Waves - Ultra Deep Concentration

40Hz Binaural Gamma Waves - Ultra Deep Concentration

PINK & ORANGE GRADIENT IN HD [3 HOURS]

PINK & ORANGE GRADIENT IN HD [3 HOURS]

ART SCREENSAVER FOR YOUR TV | NO MUSIC | 2Hour | Abstract neutral art

ART SCREENSAVER FOR YOUR TV | NO MUSIC | 2Hour | Abstract neutral art

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Aesthetic Aura Background 3 hours

Aesthetic Aura Background 3 hours

We're 99.9% sure this pattern is true, but no one can prove it

We're 99.9% sure this pattern is true, but no one can prove it

Trump Tries to Catch Reflecting Pool “Vandals” & Miami Gets a Scottish Takeover | The Daily Show

Trump Tries to Catch Reflecting Pool “Vandals” & Miami Gets a Scottish Takeover | The Daily Show

L28- Summarisation Middleware, Fault Tolerance & Context Schema | create_agent, Short-Term Memory

L28- Summarisation Middleware, Fault Tolerance & Context Schema | create_agent, Short-Term Memory

Why AI Agents are either the best or worst thing we’ve ever built

Why AI Agents are either the best or worst thing we’ve ever built

L29- RAG Explained | Retrieval Augmented Generation, Embeddings, Cosine Similarity & Semantic Search

L29- RAG Explained | Retrieval Augmented Generation, Embeddings, Cosine Similarity & Semantic Search