KV Caching: Speeding up LLM Inference [Lecture]

This is a single lecture from a course. If you you like the material and want more context (e.g., the lectures that came before), check out the whole course: https://users.umiacs.umd.edu/~jbg/tea... (Including homeworks and reading.) I often refer to LLMs / Foundation Models / Frontier Models as "Muppet Models". Here's why:    • What general term should you use for model...   I got a free EdCafe subscription for adding it into these slides: https://www.edcafe.ai/ Music:   / review-and-rest