AWS Lambda Concurrency: The Complete Mental Model

Two Lambda functions, same code, same limit. One sails through a traffic spike, the other throws 429 errors. The only difference is how concurrency works, and by the end of this video it'll make complete sense. This is a from-first-principles breakdown of AWS Lambda concurrency: the one equation that predicts it, the two ceilings that throttle you (the 1,000 account limit AND the per-function scaling speed limit), the three ways a function gets invoked (sync, async, poll-based), and the two knobs you actually control (reserved and provisioned concurrency). We finish by dropping the same throttle event on the same function across all four invocation models, so you can see exactly why behavior changes. ━━━━━━━━━━━━━━━━━━━━━━━ 📚 WHAT YOU'LL LEARN ━━━━━━━━━━━━━━━━━━━━━━━ ✅ What concurrency actually is — busy execution environments, not requests per second ✅ The one equation: concurrency = requests/sec × duration (and why a faster function needs fewer environments) ✅ The 1,000 concurrent-execution account limit, and why it's shared across every function in the region ✅ The second ceiling: the +1,000-every-10s per-function scaling speed limit (and why you get throttled below the cap) ✅ The three invocation models: synchronous (caller waits), asynchronous (buffer absorbs), poll-based (Lambda pulls) ✅ Streams vs queues: concurrency = shards, the parallelization factor (up to 10/shard), and the SQS ramp ✅ Reserved concurrency — fencing off a slice of the pool (free, but it works both ways) ✅ Cold starts and provisioned concurrency — pre-warmed environments and what they cost ✅ The payoff: one throttle event, four invocation models, four completely different outcomes ━━━━━━━━━━━━━━━━━━━━━━━ CONNECT WITH ME ━━━━━━━━━━━━━━━━━━━━━━━ 🔗 LinkedIn:   / joud-awad   🐦 X/Twitter: https://x.com/TheJoud97 📝 Medium:   / joudwawad   📧 Substack: https://joudawad.substack.com/ 🌐 Website: https://joudawad.com/ ━━━━━━━━━━━━━━━━━━━━━━━ ⏱ Chapters ━━━━━━━━━━━━━━━━━━━━━━━ 0:00 Two functions, same code, opposite outcome 0:22 What concurrency actually is 1:09 The one formula: concurrency = RPS × duration 2:08 The 1,000 ceiling (one shared account pool) 2:46 Why RPS alone can't predict throttling 3:49 The scaling speed limit (+1,000 every 10s) 4:55 Three ways a function gets invoked: push vs pull 5:37 Synchronous: the caller waits 6:28 Asynchronous: the buffer absorbs 7:46 Poll-based: Lambda pulls the work 8:50 Streams: concurrency = shard count 9:56 The parallelization factor, explained 10:45 Standard iterator vs enhanced fan-out 11:27 Queues (SQS): concurrency ramps 12:31 Knob 1: reserved concurrency 13:35 What a cold start is 14:21 Knob 2: provisioned concurrency 15:20 Same throttle, four different reactions 16:20 The whole model in one breath ━━━━━━━━━━━━━━━━━━━━━━━ ❓ QUICK ANSWERS ━━━━━━━━━━━━━━━━━━━━━━━ Q: What is AWS Lambda concurrency? A: The number of execution environments running your function at the same instant — busy environments, not requests per second. Each environment handles one request at a time. Q: How do you calculate Lambda concurrency? A: Requests per second × average duration in seconds. 100 req/s at 0.5s each = 50 concurrent environments. Q: What is the default Lambda concurrency limit? A: 1,000 concurrent executions per region, shared across every function in the account. It's a raisable default, and Lambda holds 100 units aside for functions with no reserved concurrency. Q: Why does Lambda throttle below 1,000? A: The scaling speed limit. Each function adds at most 1,000 new environments every 10 seconds, so a fast spike gets 429s even when total concurrency is well under the cap. Q: Reserved vs provisioned concurrency? A: Reserved is free and fences off a slice of the pool (a floor and a ceiling). Provisioned costs money and keeps environments pre-warmed to remove cold starts. Q: How does concurrency work with streams vs queues? A: Streams (Kinesis, DynamoDB Streams) tie concurrency to shard count — one invocation per shard, up to 10 with the parallelization factor. SQS ramps from 5 and tops out around 1,250 by default. #awslambda #aws #serverless #systemdesign #backendengineering