Tuesday, December 23, 2025

AWS Batch vs Lambda vs Step Functions | Deep Dive.

AWS Batch vs Lambda vs Step Functions - Deep Dive.

 Focus:

    • Tailord for:
      • DevOps 
      • DevSecOps 
      • Cloud Engineer.
    • Aligned with:
      • Execution model, 
      • Limits, 
      • Cost, 
      • Scaling, 
      • And real production architectures.

Scope:

  • One-Line Mental Models (Memorize This),
  • Execution Responsibility (Who Does What),
  • Runtime & Hard Limits (Deal Breakers),
  • Scaling Behavior (What Actually Scales),
  • Cost Model Comparison,
  • When Each Service Wins (Clear Use Cases),
  • State, Workflow & Reliability,
  • When Each Service Wins (Clear Use Cases),
  • Canonical Architecture (Best Practice),
  • Real-World Sample – Data Pipeline,
  • Anti-Patterns (Seen in Production),
  • Decision Matrix (Bookmark This),
  • Final thoughts.

1. One-Line Mental Models (Memorize This)

Service

      What It Is for Real

Lambda

Event-driven function execution

Step Functions

Managed workflow orchestrator

AWS Batch

Managed job scheduler + compute fleet

NB:

    • These services do different jobs. 

2. Execution Responsibility (Who Does What)

Capability

Lambda

Step Functions

Batch

Run code

Orchestrate steps

⚠️ Limited

Manage compute

Queue jobs

Handle retries

⚠️ Basic

 Advanced

 Job-level

Long-running

⚠️ (orchestration only)

NB:

    • Step Functions never run twtech code.
    • Step Functions coordinates other services.

3. Runtime & Hard Limits (Deal Breakers)

Limit

Lambda

Step Functions

Batch

Max runtime

15 min

1 year

Days/weeks

Max memory

10 GB

N/A

Instance limit

vCPU

~6

N/A

Thousands

State size

N/A

256 KB

Unlimited

GPU

4. Scaling Behavior (What Actually Scales)

Lambda

    • Scales per request
    • Near-instant
    • Throttling possible
    • Cold starts

Step Functions

    • Scales state transitions
    • Concurrency limits
    • Not compute bound

Batch

    • Scales infrastructure
    • Slower startup
    • Predictable throughput

Scenario

     Best Fit

10K API requests

Lambda

10K workflow steps

Step Functions

10K CPU-heavy jobs

Batch

5. Cost Model Comparison

Service

How twtech Pays

Cost Risk

Lambda

ms + memory

High for CPU-heavy

Step Functions

Per state transition

Explodes in loops

Batch

EC2 / Fargate

Idle compute

Rule of thumb

    • Orchestration execution
    • Execution orchestration

6. State, Workflow & Reliability

Lambda

    • Stateless
    • External state required
    • Event retries depend on source

Step Functions

    • Durable state
    • Exactly-once transitions
    • Built-in error handling
    • Visual DAG

Batch

    • Job state tracking
    • Exit-code based retries
    • Dependencies possible but limited

7. When Each Service Wins (Clear Use Cases)

AWS Lambda – “Fast Logic”

Use when:

    • Event-driven tasks
    • API handlers
    • Validation
    • Fan-out / fan-in
    • Lightweight ETL

  Don’t use for:

    • Long loops
    • Heavy compute
    • File processing

Step Functions – “Control Plane

Use when:

    • Multi-step workflows
    • Conditional logic
    • Human approval steps
    • Service coordination
    • Long-running orchestration

   Don’t use for:

    • CPU work
    • Tight loops
    • High-frequency polling

AWS Batch – “Execution Engine”

Use when:

    • Long-running jobs
    • CPU / memory heavy workloads
    • Parallel processing
    • Spot optimization
    • ML / analytics

   Don’t use for:

    • APIs
    • Event glue
    • Real-time responses

8. Canonical Architecture (Best Practice)

Lambda + Step Functions + Batch (Together)

Responsibilities

Layer

Role

Lambda

Fast logic

Step Functions

Orchestration

Batch

Execution

NB:

  • This separation scales, secures, and costs less.

9. Real-World Sample – Data Pipeline

Scenario

  • Daily analytics pipeline with retries and backfills.

1.     Lambda

    •    Triggered by S3
    •    Validates input

2.     Step Functions

    •    Determines dataset type
    •    Chooses transformation path

3.     Batch

    •    Runs Spark-like container jobs
    •    Uses EC2 Spot

4.     Step Functions

    •    Aggregates results
    •    Sends notification

10. Anti-Patterns (Seen in Production)

❌   Step Functions calling Lambda in a tight loop
❌   Lambda chaining to bypass time limits
❌   Batch used for orchestration logic
❌   Step Functions used as a compute engine

11. Decision Matrix (Bookmark This)

Requirement

Choose

Event-driven

Lambda

Workflow / DAG

Step Functions

Long-running compute

Batch

Retry logic

Step Functions

Cheapest CPU

Batch (Spot)

Zero ops

Lambda

Visual execution

Step Functions

12. Final thoughts

    • Lambda executes logic.
    • Step Functions coordinate systems.
    • AWS Batch runs heavy jobs.




No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...