Amazon DynamoDB Streams Vs Kinesis Data Streams (KDS) - Deep Dive.
Scope:
- Intro,
- The Concept: DynamoDB Streams,
- How DynamoDB Stream Processing Works,
- Common Consumers (target),
- Typical Processing Pattern (Diagram),
- Key Features,
- Use Cases,
- Design Considerations,
- Advanced Pattern – Enhanced Fan-Out,
- Deep comparison between DynamoDB Streams and Amazon Kinesis Data Streams (KDS),
- Core Purpose comparative table,
- Data Flow for DynamoDB Streams & Kinesis Data Streams,
- Key Capabilities table,
- Performance table,
- Integration Patterns When to Use DynamoDB Streams & Kinesis Data Streams,
- Architecture Relationship,
- Decision Matrix,
- DynamoDB CDC (capturing and reacting to data changes),
- Kinesis Data Streams for DynamoDB (options),
- Sample Real-time analysis of sensor data from transportation vehicles,
- Best practices.
Intro:
- A deep dive into DynamoDB Stream Processing & Kinesis Data Streams,.
- Including:
- What it's all about,
- Steps to design a robust streaming architecture.
1. The Concept: DynamoDB Streams
- Change data capture (CDC) feature for DynamoDB.
- Emits a time-ordered sequence of item-level
modifications in a table.
- Streams contain information about:
- INSERT
(new items)
- MODIFY
(updated items)
- REMOVE
(deleted items)
- Retention: 24 hours from the time of change.
- Can be enabled per table and configured to
capture:
- Keys only
- New image
- Old image
- Both old & new images
2. How DynamoDB Stream
Processing Works
- Change occurs in DynamoDB table.
- Change event is recorded in the stream.
- Consumers (Lambda, Kinesis, custom apps) read from the
stream.
- Consumer processes event for analytics, indexing,
replication, or triggering workflows.
3. Common Consumers (target)
- AWS Lambda
(serverless, fully managed)
- Triggered automatically by new stream records.
- At-least-once processing.
- Batch size, window, and parallelism configurable.
- Amazon Kinesis Data Streams
- For fan-out to multiple consumers.
- Lower latency and higher throughput.
- Custom Stream Processing
- Using KCL (Kinesis Client Library) to build twtech own
consumers.
4. Typical
Processing Pattern (Diagram)
5. Key Features
- Exactly-once delivery?
Not guaranteed — it’s at-least-once, so consumers must be idempotent. - Ordering:
Guaranteed per partition key.
- Throughput:
Stream shares the table’s write capacity.
- Latency:
Typically under 1 second for Lambda triggers.
- Parallelism:
Scales with number of shards (1 shard per 1MB/s write throughput or 1000
writes/s).
6. Use Cases
- Event-driven microservices
- Materialized views
for query optimization
- Full-text search
with OpenSearch
- Real-time analytics pipelines
- Cross-region replication
- Data archiving to S3
7. Design Considerations
- Idempotency:
Make stream processors safe to re-run.
- Error handling:
Use DLQs (Dead Letter Queues) with Lambda or retry logic.
- Backpressure:
High write volumes can overwhelm consumers — scale appropriately.
- Security:
Restrict IAM permissions for stream reading/writing.
- Fan-out:
Use DynamoDB Streams + Kinesis Data Streams for multiple
independent consumers.
8. Advanced Pattern – Enhanced Fan-Out
If twtech has multiple consumers that
need low-latency access without sharing throughput:
- Enable DynamoDB Streams → Kinesis Data Streams Enhanced Fan-Out (EFO).
- Each consumer gets 2MB/s/shard dedicated
throughput.
Deep comparison between
DynamoDB Streams and
Amazon Kinesis Data Streams (KDS)
- Architecture,
- Capabilities,
- Limits,
- When to use each.
1. Core Purpose comparative table
|
Feature |
DynamoDB Streams |
Kinesis Data Streams (KDS) |
|
What it is |
Change Data Capture (CDC) stream
for DynamoDB table changes only. |
General-purpose, high-throughput,
low-latency streaming data platform. |
|
Data Source |
Automatically generated by
DynamoDB table writes. |
Any producer (applications, IoT
devices, logs, etc.). |
|
Scope |
Narrow — tied to a single DynamoDB
table. |
Broad — can ingest any kind of
event data. |
2. Data Flow for DynamoDB Streams
- Item change in table.
- Event is added to the stream.
- Consumers (Lambda, KCL app, Kinesis) read events.
- Events expire after 24 hours.
Data Flow for Kinesis Data Streams
- Producers write arbitrary records (max 1MB/record).
- Records stored in shards.
- Consumers (Lambda, KCL apps, analytics services) read.
- Retention: 24 hours – 365 days (configurable).
3. Key Capabilities table
|
Capability |
DynamoDB Streams |
Kinesis Data Streams |
|
Data Retention |
Fixed 24 hours |
Configurable (1 day – 365 days) |
|
Ordering |
Guaranteed per partition key |
Guaranteed per shard |
|
Data Payload |
Item keys, old/new images,
metadata |
Arbitrary byte payload |
|
Throughput Scaling |
Matches table WCU |
Scales by shard count |
|
Fan-out |
Standard read or Enhanced Fan-Out
(via Kinesis integration) |
Standard read or Enhanced Fan-Out |
|
Trigger Lambda |
Native |
Native |
|
Multi-Source Data |
❌ (table-specific) |
✅ |
|
Cross-Region Replication |
Via Global Tables |
Via Kinesis Data Streams + Data
Firehose |
4. Performance table
|
Metric |
DynamoDB Streams |
Kinesis Data Streams |
|
Latency |
~<1s for Lambda triggers |
~<70ms for Enhanced Fan-Out |
|
Max Record Size |
~400 KB (DynamoDB item size limit) |
1 MB per record |
|
Scaling Granularity |
Per table (WCU-based shards) |
Per shard (manual or auto-scaling) |
5. Integration Patterns When to Use DynamoDB Streams
- Event-driven actions only from DynamoDB changes.
- Automatic integration with Lambda (no producers to
manage).
- Building materialized views, audit logs, cross-region
replication.
- Real-time search indexing (e.g., OpenSearch updates
from table writes).
Integration Patterns When
to Use Kinesis Data Streams
- Multiple heterogeneous producers (IoT, logs, API events).
- Complex streaming analytics (via Kinesis Data
Analytics, Flink, Spark).
- Longer retention or replay needs.
- Very high throughput ingestion (millions of
records/sec).
- Multi-consumer pipelines (fan-out processing for
different purposes).
6. Architecture Relationship
NB:
They’re not mutually exclusive — in fact:
- DynamoDB Streams → Kinesis Data Streams is a common pattern.
- DynamoDB table emits change stream.
- Kinesis fan-outs to multiple independent consumers
without competing for read throughput.
7. Decision Matrix
|
Question |
Use
DynamoDB Streams |
Use
Kinesis Data Streams |
|
Need changes from DynamoDB table
only? |
✅ |
❌ |
|
Need multiple, unrelated data
sources? |
❌ |
✅ |
|
Need >24h retention? |
❌ |
✅ |
|
Need to replay old data easily? |
❌ |
✅ |
|
Require exactly-once processing? |
❌ (at-least-once) |
❌ (at-least-once) |
|
Want lowest operational overhead
for DynamoDB CDC? |
✅ |
❌ |
Insights:
DynamoDB
CDC (reacting to Change data capture)
- DynamoDB Change Data Capture (CDC) enables applications to capture item-level changes in a DynamoDB table in near real-time as a stream of data records.
- Change Data Capture allows applications to efficiently process and respond to data modifications, facilitating use cases like:
- Real-time data integration,
- keeping operational data in sync across systems,
- Warehouses, and other applications.
- Building responsive applications that react instantly to data changes (e.g., sending welcome emails upon new user signups).
- Mirroring data between DynamoDB tables, potentially across regions,
- Maintaining an audit trail of data modifications for record-keeping and compliance purposes.
-
Real-time analytics: feeding
DynamoDB changes into analytics platforms for near real-time
insights (e.g., updating leaderboards in a gaming application).
DynamoDB CDC (options)
DynamoDB
offers two streaming models for CDC:
DynamoDB
Streams:
- Provides a time-ordered log of item-level modifications for a DynamoDB table, accessible for up to 24 hours.
Features:
-
Preserves the exact order of modifications.
-
Each modification generates exactly one stream record (no
duplicates).
-
Automatically ignores operations that do not modify data.
-
Data accessible for only 24 hours.
-
Default limit of 2 simultaneous consumers per shard (more with
enhanced fan-out in Kinesis).
Sample use case:
- Tracking inventory changes for an e-commerce platform.
Kinesis Data Streams for DynamoDB (options):
- Replicates item-level modifications to an Amazon Kinesis data stream.
Features:
-
Longer data retention (up to 365 days).
-
Higher throughput and more simultaneous consumers (up to 20 with enhanced fan-out).
-
Integration with other Kinesis services (Data Firehose, Data
Analytics) for advanced analytics and delivery to destinations like Amazon
S3, Redshift, or OpenSearch Service.
Considerations:
- Record
ordering and deduplication may require client-side implementation.
: Real-time
analysis of sensor data from transportation vehicles.
- Implementing
DynamoDB CDC
- Enable
streaming on your DynamoDB table (either DynamoDB Streams or Kinesis
Data Streams) using the AWS Console, AWS SDK, or AWS CLI.
- Configure stream record content: choose to include only keys, the new item image, the old item image, or both.
NB:
- AWS Lambda: automatically triggered by new stream records for processing and workflow invocation.
- DynamoDB Streams Kinesis Adapter: allows applications to use the Kinesis Client Library (KCL) to process records from DynamoDB Streams, offering more control over stream processing.
- Kinesis Data Streams (if enabled): integrate with other Kinesis services and downstream destinations like S3 or Redshift.
- Third-party tools like: Estuary Flow, which offer simplified CDC implementation with features like backfill and deduplication.
Best
practices
- twtech Chooses the appropriate streaming model based on its needs for data retention, throughput, and consumer fan-out.
- Optimize KCL or Lambda consumers for efficient processing and to avoid issues like Lambda cold starts.
- Monitor costs associated with stream processing.
- Implement proper error handling for stream processing to ensure data consistency.
- By leveraging DynamoDB CDC, twtech shoould build a robust and scalable applications that react to data changes in real time, integrate seamlessly with other services, and support various use cases from real-time analytics to auditing and compliance.
No comments:
Post a Comment