A break down of DynamoDB Stream
Processing …what it's all about and steps to
design robust streaming architectures.
1. The Concept: DynamoDB Streams
- Change data capture (CDC) feature for DynamoDB.
- Emits a time-ordered sequence of item-level
modifications in a table.
- Streams contain information about:
- INSERT
(new items)
- MODIFY
(updated items)
- REMOVE
(deleted items)
- Retention: 24 hours from the time of change.
- Can be enabled per table and configured to
capture:
- Keys only
- New image
- Old image
- Both old & new images
2. How DynamoDB Stream
Processing Works
- Change occurs in DynamoDB table.
- Change event is recorded in the stream.
- Consumers (Lambda, Kinesis, custom apps) read from the
stream.
- Consumer processes event for analytics, indexing,
replication, or triggering workflows.
3. Common Consumers
- AWS Lambda
(serverless, fully managed)
- Triggered automatically by new stream records.
- At-least-once processing.
- Batch size, window, and parallelism configurable.
- Amazon Kinesis Data Streams
- For fan-out to multiple consumers.
- Lower latency and higher throughput.
- Custom Stream Processing
- Using KCL (Kinesis Client Library) to build your own
consumers.
4. Typical
Processing Pattern
5. Key Features
- Exactly-once delivery?
Not guaranteed — it’s at-least-once, so consumers must be idempotent. - Ordering:
Guaranteed per partition key.
- Throughput:
Stream shares the table’s write capacity.
- Latency:
Typically under 1 second for Lambda triggers.
- Parallelism:
Scales with number of shards (1 shard per 1MB/s write throughput or 1000
writes/s).
6. Use Cases
- Event-driven microservices
- Materialized views
for query optimization
- Full-text search
with OpenSearch
- Real-time analytics pipelines
- Cross-region replication
- Data archiving to S3
7. Design Considerations
- Idempotency:
Make stream processors safe to re-run.
- Error handling:
Use DLQs (Dead Letter Queues) with Lambda or retry logic.
- Backpressure:
High write volumes can overwhelm consumers — scale appropriately.
- Security:
Restrict IAM permissions for stream reading/writing.
- Fan-out:
Use DynamoDB Streams + Kinesis Data Streams for multiple
independent consumers.
8. Advanced Pattern – Enhanced Fan-Out
If you have multiple consumers that
need low-latency access without sharing throughput:
- Enable DynamoDB Streams → Kinesis Data Streams (EFO).
- Each consumer gets 2MB/s/shard dedicated
throughput.
Here’s twtech deep comparison between
DynamoDB Streams and
Amazon Kinesis Data Streams (KDS) — from
the stand-point of:
- Architecture,
- Capabilities,
- Limits,
- When to use each.
1. Core Purpose
Feature |
DynamoDB Streams |
Kinesis Data Streams (KDS) |
What it is |
Change Data Capture (CDC) stream
for DynamoDB table changes only. |
General-purpose, high-throughput,
low-latency streaming data platform. |
Data Source |
Automatically generated by
DynamoDB table writes. |
Any producer (applications, IoT
devices, logs, etc.). |
Scope |
Narrow — tied to a single DynamoDB
table. |
Broad — can ingest any kind of
event data. |
2. Data Flow
DynamoDB Streams
- Item change in table.
- Event is added to the stream.
- Consumers (Lambda, KCL app, Kinesis) read events.
- Events expire after 24 hours.
Kinesis Data Streams
- Producers write arbitrary records (max 1MB/record).
- Records stored in shards.
- Consumers (Lambda, KCL apps, analytics services) read.
- Retention: 24 hours – 365 days (configurable).
3. Key Capabilities
Capability |
DynamoDB Streams |
Kinesis Data Streams |
Data Retention |
Fixed 24 hours |
Configurable (1 day – 365 days) |
Ordering |
Guaranteed per partition key |
Guaranteed per shard |
Data Payload |
Item keys, old/new images,
metadata |
Arbitrary byte payload |
Throughput Scaling |
Matches table WCU |
Scales by shard count |
Fan-out |
Standard read or Enhanced Fan-Out
(via Kinesis integration) |
Standard read or Enhanced Fan-Out |
Trigger Lambda |
Native |
Native |
Multi-Source Data |
❌ (table-specific) |
✅ |
Cross-Region Replication |
Via Global Tables |
Via Kinesis Data Streams + Data
Firehose |
4. Performance
Metric |
DynamoDB Streams |
Kinesis Data Streams |
Latency |
~<1s for Lambda triggers |
~<70ms for Enhanced Fan-Out |
Max Record Size |
~400 KB (DynamoDB item size limit) |
1 MB per record |
Scaling Granularity |
Per table (WCU-based shards) |
Per shard (manual or auto-scaling) |
5. Integration Patterns
When
to Use DynamoDB Streams
- Event-driven actions only from DynamoDB changes.
- Automatic integration with Lambda (no producers to
manage).
- Building materialized views, audit logs, cross-region
replication.
- Real-time search indexing (e.g., OpenSearch updates
from table writes).
When
to Use Kinesis Data Streams
- Multiple heterogeneous producers (IoT, logs, API events).
- Complex streaming analytics (via Kinesis Data
Analytics, Flink, Spark).
- Longer retention or replay needs.
- Very high throughput ingestion (millions of
records/sec).
- Multi-consumer pipelines (fan-out processing for
different purposes).
6. Architecture Relationship
They’re not mutually exclusive — in
fact:
- DynamoDB Streams → Kinesis Data Streams is a common pattern.
- DynamoDB table emits change stream.
- Kinesis fan-outs to multiple independent consumers
without competing for read throughput.
7. Decision Matrix
Question |
Use
DynamoDB Streams |
Use
Kinesis Data Streams |
Need changes from DynamoDB table
only? |
✅ |
❌ |
Need multiple, unrelated data
sources? |
❌ |
✅ |
Need >24h retention? |
❌ |
✅ |
Need to replay old data easily? |
❌ |
✅ |
Require exactly-once processing? |
❌ (at-least-once) |
❌ (at-least-once) |
Want lowest operational overhead
for DynamoDB CDC? |
✅ |
❌ |
Insights:
Insights:
DynamoDB
CDC: capturing and reacting to data changes
DynamoDB Change
Data Capture (CDC) enables applications to capture
item-level changes in a DynamoDB table in near real-time as a stream of data
records.
This
allows applications to efficiently process and respond to data modifications,
facilitating use cases like:· Real-time data integration: keeping
operational data in sync across systems, warehouses, and other applications.
-
: building
responsive applications that react instantly to data changes (e.g.,
sending welcome emails upon new user signups).
-
mirroring
data between DynamoDB tables, potentially across regions.
-
: maintaining an audit
trail of data modifications for record-keeping and compliance purposes.
-
Real-time analytics: feeding
DynamoDB changes into analytics platforms for near real-time
insights (e.g., updating leaderboards in a gaming application).
DynamoDB CDC (options)
DynamoDB
offers two streaming models for CDC:
DynamoDB
Streams: Provides a time-ordered log of item-level modifications
for a DynamoDB table, accessible for up to 24 hours.
Features:
-
Preserves the exact order of modifications.
-
Each modification generates exactly one stream record (no
duplicates).
-
Automatically ignores operations that do not modify data.
Considerations:
-
Data accessible for only 24 hours.
-
Default limit of 2 simultaneous consumers per shard (more with
enhanced fan-out in Kinesis).
Example
use case: Tracking inventory changes for an e-commerce
platform.
Kinesis Data Streams for DynamoDB (options):
Replicates item-level modifications to an Amazon
Kinesis data stream.
Features:
-
Longer data retention (up to 365 days).
-
Higher throughput and more simultaneous consumers (up to 20 with enhanced fan-out).
-
Integration with other Kinesis services (Data Firehose, Data
Analytics) for advanced analytics and delivery to destinations like Amazon
S3, Redshift, or OpenSearch Service.
Considerations:
- Record
ordering and deduplication may require client-side implementation.
: Real-time
analysis of sensor data from transportation vehicles.
- Implementing
DynamoDB CDC
- Enable
streaming on your DynamoDB table (either DynamoDB Streams or Kinesis
Data Streams) using the AWS Console, AWS SDK, or AWS CLI.
- Configure
stream record content: choose to include only keys, the new item
image, the old item image, or both.
AWS
Lambda: automatically triggered by new stream records for processing and
workflow invocation.
DynamoDB
Streams Kinesis Adapter: allows applications to use the Kinesis Client
Library (KCL) to process records from DynamoDB Streams, offering more control
over stream processing.
Kinesis
Data Streams (if enabled): integrate with other
Kinesis services and downstream destinations like S3 or Redshift.
Third-party
tools like: Estuary Flow, which offer simplified CDC
implementation with features like backfill and deduplication.
Best
practices
- twtech Chooses the appropriate
streaming model based on its needs for data retention,
throughput, and consumer fan-out.
-
Optimize KCL or Lambda consumers for
efficient processing and to avoid issues like Lambda cold starts.
-
Monitor costs associated with stream
processing.
-
Implement proper error handling for stream
processing to ensure data consistency.
- By
leveraging DynamoDB CDC, twtech shoould build robust and scalable applications
that react to data changes in real time, integrate seamlessly with other
services, and support various use cases from real-time analytics to
auditing and compliance.
No comments:
Post a Comment