Lambda Concurrency & Throttling - Overview.
Scope:
- Intro,
- The concept: Concurrency in Lambda,
- Types of Concurrency,
- How Scaling Works,
- Throttling,
- Key Quotas Affecting Concurrency table,
- How to Avoid Throttling,
- Quick Example.
Intro:
- Here’s twtech Overview of AWS Lambda Concurrency & Throttling.
- This includes:
- How it works,
- What limits apply,
- How to avoid “Throttled” errors.
1. The concept: Concurrency
in Lambda
- Concurrency is
the number of function instances running at
the same time in twtech AWS account per region.
- Each concurrent execution is an isolated environment with its own memory, CPU, and /tmp storage.
- The Default account concurrency limit is 1,000 concurrent executions per
region (soft limit — can request increase).
- All functions in a region share this concurrency
pool unless twtech sets reserved concurrency for specific functions.
2. Types of Concurrency
|
Type |
Purpose |
Key Effect |
|
Unreserved Concurrency. |
Shared pool for all functions
without specific reservations. |
If one function uses too much,
others may be throttled. |
|
Reserved Concurrency. |
Dedicated concurrency for a
function (also caps its max concurrency). |
Prevents noisy neighbor problems;
guarantees capacity. |
|
Provisioned Concurrency. |
Pre-warmed instances always ready
to handle requests. |
Removes cold starts for
predictable workloads; billed even when idle. |
3. How Scaling Works
When requests come in:
- Lambda starts with burst concurrency:
- Up to 1,000
concurrent executions per region (most AWS regions) in the first burst.
- Some regions like US-East-1 have higher initial burst
(~3,000).
- If requests exceed the current concurrency, Lambda ramps
up at:
- +500 instances every minute (until hitting twtech account limit).
- When the account concurrency limit is reached → extra requests are throttled.
- Throttle means twtech service is intentionally slowing or blocking its requests because twtech has exceeded its allowed rate (number per time period)
- Throttle occurs to prevent overload and maintain performance for all users.
4. Throttling
- When Lambda can’t process a request because of concurrency limits, it throws a 429 TooManyRequestsException.
Behavior differs depending on invocation type:
- Synchronous invocation (e.g., API Gateway, SDK calls):
- Caller immediately gets a 429
error.
- twtech can retry from the client side.
- Asynchronous invocation (e.g., S3 events, EventBridge):
- Lambda automatically retries twice (with delays).
- After retries fail → failed event goes to Dead Letter
Queue (DLQ) or on-failure destination if configured.
5. Key Quotas Affecting Concurrency table
|
Limit |
Default |
Adjustable? |
|
Account concurrency. |
1,000 |
✅ Yes |
|
Reserved concurrency per function. |
0 → unlimited (bounded by account) |
✅ Yes |
|
Provisioned concurrency per
function. |
Billed limit per account |
✅ Yes |
|
Burst concurrency. |
1,000 (some regions higher) |
❌ No |
|
Ramp-up rate |
+500/min |
❌ No |
6. How to Avoid Throttling
- Use Reserved Concurrency to protect critical functions.
- Request concurrency limit increases from AWS Support.
- Use Provisioned Concurrency for low-latency, predictable workloads.
- Queue or batch requests with services like SQS or Kinesis.
- Spread load across regions if architecture allows.
- Monitor metrics
in CloudWatch:
- ConcurrentExecutions
- Throttles
- ProvisionedConcurrencyUtilization
7. Quick Example
Let’s say:
- Account concurrency = 1,000
- Function A has Reserved Concurrency = 300
- Function B has no reservation.
If Function A gets 350 requests at
once:
- 300 requests are processed.
- 50 requests are throttled (or retried if async).
- Function B with no reservation and others share the remaining 700 concurrency.
No comments:
Post a Comment