AWS Batch vs Lambda vs Step Functions - Deep Dive.
Focus:
- Tailord for:
- DevOps
- DevSecOps
- Cloud Engineer.
- Aligned with:
- Execution model,
- Limits,
- Cost,
- Scaling,
- And real production architectures.
Scope:
- One-Line Mental Models (Memorize This),
- Execution Responsibility (Who Does What),
- Runtime & Hard Limits (Deal Breakers),
- Scaling Behavior (What Actually Scales),
- Cost
Model Comparison,
- When Each Service Wins (Clear Use Cases),
- State,
Workflow &
Reliability,
- When Each Service Wins (Clear Use Cases),
- Canonical Architecture (Best Practice),
- Real-World
Sample – Data Pipeline,
- Anti-Patterns (Seen in Production),
- Decision Matrix (Bookmark This),
- Final thoughts.
1. One-Line Mental Models (Memorize This)
|
|
|
|
|
|
|
|
NB:
- These services do different jobs.
2. Execution Responsibility (Who Does What)
|
Capability |
Lambda |
Step
Functions |
Batch |
|
Run
code |
✅ |
❌ |
✅ |
|
Orchestrate
steps |
❌ |
✅ |
⚠️ Limited |
|
Manage
compute |
❌ |
❌ |
✅ |
|
Queue
jobs |
❌ |
❌ |
✅ |
|
Handle
retries |
⚠️ Basic |
✅ Advanced |
✅ Job-level |
|
Long-running |
❌ |
⚠️ (orchestration only) |
✅ |
NB:
- Step Functions never run twtech code.
- Step Functions coordinates other services.
3. Runtime & Hard Limits (Deal Breakers)
|
Limit |
Lambda |
Step Functions |
Batch |
|
Max
runtime |
15 min |
1 year |
Days/weeks |
|
Max
memory |
10 GB |
N/A |
Instance limit |
|
vCPU |
~6 |
N/A |
Thousands |
|
State
size |
N/A |
256 KB |
Unlimited |
|
GPU |
❌ |
❌ |
✅ |
4. Scaling Behavior (What
Actually Scales)
Lambda
- Scales per
request
- Near-instant
- Throttling
possible
- Cold starts
Step
Functions
- Scales state
transitions
- Concurrency
limits
- Not compute
bound
Batch
- Scales infrastructure
- Slower startup
- Predictable throughput
|
|
|
|
|
|
|
|
5. Cost Model Comparison
|
|
|
|
|
|
|
|
|
|
|
|
Rule of thumb
- Orchestration
≠ execution
- Execution ≠
orchestration
6. State, Workflow & Reliability
Lambda
- Stateless
- External
state required
- Event retries
depend on source
Step
Functions
- Durable state
- Exactly-once
transitions
- Built-in
error handling
- Visual DAG
Batch
- Job state
tracking
- Exit-code
based retries
- Dependencies
possible but limited
7. When Each Service Wins (Clear Use Cases)
AWS
Lambda – “Fast Logic”
Use when:
- Event-driven
tasks
- API handlers
- Validation
- Fan-out /
fan-in
- Lightweight
ETL
❌ Don’t use for:
- Long loops
- Heavy compute
- File
processing
Step
Functions – “Control Plane”
Use when:
- Multi-step
workflows
- Conditional
logic
- Human
approval steps
- Service
coordination
- Long-running
orchestration
❌ Don’t use
for:
- CPU work
- Tight loops
- High-frequency
polling
AWS Batch –
“Execution Engine”
Use when:
- Long-running
jobs
- CPU / memory
heavy workloads
- Parallel
processing
- Spot
optimization
- ML /
analytics
❌ Don’t use for:
- APIs
- Event glue
- Real-time
responses
8. Canonical Architecture (Best Practice)
Lambda + Step Functions + Batch (Together)
Responsibilities
|
Layer |
Role |
|
Lambda |
Fast logic |
|
Step
Functions |
Orchestration |
|
Batch |
Execution |
NB:
- This separation scales, secures, and costs less.
9. Real-World Sample – Data Pipeline
Scenario
- Daily analytics pipeline with retries and backfills.
1.
Lambda
- Triggered by S3
- Validates input
2.
Step Functions
- Determines dataset type
- Chooses transformation path
3.
Batch
- Runs Spark-like container jobs
- Uses EC2 Spot
4.
Step Functions
- Aggregates results
- Sends notification
10. Anti-Patterns (Seen
in Production)
❌ Step Functions calling Lambda in a tight loop
❌ Lambda chaining to bypass time limits
❌ Batch used for orchestration logic
❌ Step Functions used as a compute engine
11. Decision Matrix (Bookmark
This)
|
Requirement |
Choose |
|
Event-driven |
Lambda |
|
Workflow
/ DAG |
Step Functions |
|
Long-running
compute |
Batch |
|
Retry
logic |
Step Functions |
|
Cheapest
CPU |
Batch (Spot) |
|
Zero
ops |
Lambda |
|
Visual
execution |
Step Functions |
12. Final thoughts
- Lambda executes logic.
- Step Functions coordinate systems.
- AWS Batch runs heavy jobs.
No comments:
Post a Comment