Here’s twtech deep dive on AWS Lambda Concurrency & Throttling — how it works, what limits apply, and how to
avoid “Throttled” errors.
1. The concept: Concurrency
in Lambda
Concurrency is
the number of function instances running at
the same time in twtech AWS account per region.
Each concurrent execution is an isolated environment with its own
memory, CPU, and /tmp
storage.
- Default account concurrency limit: 1,000 concurrent executions per
region (soft limit — can request increase).
- All functions in a region share this concurrency
pool unless you set reserved concurrency for specific functions.
2. Types of Concurrency
Type |
Purpose |
Key Effect |
Unreserved Concurrency. |
Shared pool for all functions
without specific reservations. |
If one function uses too much,
others may be throttled. |
Reserved Concurrency. |
Dedicated concurrency for a
function (also caps its max concurrency). |
Prevents noisy neighbor problems;
guarantees capacity. |
Provisioned Concurrency. |
Pre-warmed instances always ready
to handle requests. |
Removes cold starts for
predictable workloads; billed even when idle. |
3. How Scaling Works
When requests come in:
- Lambda starts with burst concurrency:
- Up to 1,000
concurrent executions per region (most AWS regions) in the first burst.
- Some regions like US-East-1 have higher initial burst
(~3,000).
- If requests exceed the current concurrency, Lambda ramps
up at:
- +500 instances every minute (until hitting twtech account limit).
- When the account concurrency limit is reached → extra
requests are throttled.
4. Throttling
When Lambda can’t process a request
because of concurrency limits, it throws a 429
TooManyRequestsException.
Behavior differs depending on invocation type:
- Synchronous invocation (e.g., API Gateway, SDK calls):
- Caller immediately gets a 429
error.
- twtech can retry from the client side.
- Asynchronous invocation (e.g., S3 events, EventBridge):
- Lambda automatically retries twice (with delays).
- After retries fail → event goes to Dead Letter
Queue (DLQ) or on-failure destination if configured.
5. Key Quotas Affecting Concurrency
Limit |
Default |
Adjustable? |
Account concurrency. |
1,000 |
✅ Yes |
Reserved concurrency per function. |
0 → unlimited (bounded by account) |
✅ Yes |
Provisioned concurrency per
function. |
Billed limit per account |
✅ Yes |
Burst concurrency. |
1,000 (some regions higher) |
❌ No |
Ramp-up rate |
+500/min |
❌ No |
6. Avoiding Throttling
- Use Reserved Concurrency to protect critical functions.
- Request concurrency limit increases from AWS Support.
- Use Provisioned Concurrency for low-latency, predictable workloads.
- Queue or batch requests with services like SQS or Kinesis.
- Spread load across regions if architecture allows.
- Monitor metrics
in CloudWatch:
- ConcurrentExecutions
- Throttles
- ProvisionedConcurrencyUtilization
7. Quick Example
Let’s say:
- Account concurrency = 1,000
- Function A has Reserved Concurrency = 300
- Function B has no reservation.
If Function A gets 350 requests at
once:
- 300 are processed.
- 50 are throttled (or retried if async).
Meanwhile, Function B and others share the remaining 700 concurrency.
No comments:
Post a Comment