Here’s twtech focused overview of Amazon Kinesis Data Streams (KDS) helps twtech to understand its architecture, use cases, performance
characteristics, and how to use it effectively.
The concept: Kinesis Data Streams
Amazon Kinesis Data Streams (KDS) is a real-time data streaming service that lets you ingest,
buffer, and process large volumes of data continuously in real time.
Use it to capture things like:
- Clickstream data
- Log files
- IoT telemetry
- Social media feeds
- Application events
Core Components to keep in mind
Component |
Description |
Stream |
The core object in KDS; holds a
series of shards. |
Shard |
A unit of capacity: 1 MB/sec write
and 2 MB/sec read throughput. |
Data Record |
The individual unit of data (up to
1 MB), includes a partition key, data blob, and sequence number. |
Partition Key |
Determines how records are
distributed across shards. |
Retention Period |
Default 24 hours (can go up to 365
days). |
How Data Flows in KDS
- Producers
(apps, IoT devices, AWS services) send records to a stream.
- KDS stores records
in shards in ordered sequences.
- Consumers
(like Lambda, KCL apps, or custom clients) read and process the data.
Producer Options
- AWS SDK (PutRecord / PutRecords)
- AWS IoT Core
- Amazon CloudWatch Logs subscriptions
- AWS Kinesis Agent (for log files)
Consumer Options
Type |
Description |
Kinesis Client Library (KCL) |
Java-based library to build robust
consumers. Handles load balancing and checkpointing. |
AWS Lambda |
Easiest option. Automatically
scales, processes batches from the stream. |
Enhanced Fan-Out |
Each consumer gets a dedicated 2
MB/sec pipe per shard — ideal for low-latency parallel consumers. |
GetRecords / GetShardIterator |
Low-level APIs for custom
consumers. |
Configuration & Scaling
- Manual Scaling:
Add/remove shards.
- Auto Scaling:
Enable Kinesis scaling with CloudWatch and Application Auto Scaling.
- Provisioned Throughput: Based on shard count (1 MB/sec in, 2 MB/sec out per
shard).
- Limits:
Each record = max 1 MB; max 1,000 PUT records per second per shard.
Security
- IAM policies
for access control
- Encryption at rest
with AWS KMS
- TLS encryption in transit
- VPC endpoints for private access
Monitoring
twtech uses CloudWatch to track changes: Monitoring & Observability.
- IncomingBytes, IncomingRecords
- GetRecords.IteratorAgeMilliseconds
- ReadProvisionedThroughputExceeded
- WriteProvisionedThroughputExceeded
Common Use Cases
Use
Case |
Description |
Real-time analytics |
Collect clickstream or game
telemetry for live dashboards. |
Log processing |
Ingest app/server logs to process
or archive. |
IoT data |
Collect sensor data for real-time
processing. |
Data lake ingestion |
Stream into S3 (via Firehose or
custom consumers). |
Machine learning |
Feed real-time features into ML
models. |
Compared to Other Services
Feature |
KDS |
Kinesis
Firehose |
Amazon
MSK |
Purpose |
Real-time stream processing |
Data delivery |
Kafka-compatible event streaming |
Control |
High |
Minimal |
High (Kafka config) |
Latency |
Low (ms) |
~1 min |
Low |
Use Case |
Real-time apps |
Simple delivery pipelines |
Kafka-based systems |
Getting Started
# bash
#
Example: Using AWS CLI to put a record
aws
kinesis put-record \
--stream-name twtech-data-stream \
--partition-key user1 \
--data "Hello twtech team from
Kinesis!"
Or configure AWS Lambda as a
consumer directly from the console and link it to kinesis data stream.
Here's twtech step-by-step guide to configure AWS Lambda as a consumer of a Kinesis Data Stream using the AWS Console — no code needed initially.
This is one of the easiest and
most powerful ways to process streaming data in real time.
Prerequisites
- A Kinesis Data Stream (e.g., twtech-kds-stream)
- A Lambda function (or twtech creates one)
- Appropriate IAM permissions (Lambda needs to read from Kinesis and write logs to CloudWatch)
Step-by-Step: Link Lambda to Kinesis from
Console
Step
1: Go to Lambda Console
- Open the AWS Lambda Console
- Click “Create function”
- Choose Author from scratch
- Give it a name (e.g., KinesisStreamConsumer)
- Choose a runtime (e.g., Python 3.x or Node.js)
- Create a new execution role or use an existing
one (see IAM tip below)
- Click “Create function”
Step
2: Add Kinesis as a Trigger
- After the function is created, scroll to the Function
overview
- Click “+ Add trigger”
- In the “Select a trigger” dropdown, choose Kinesis
- Configure:
- Kinesis stream:
Choose twtech stream (e.g., twtech-kds-stream)
- Batch size:
100 (default; smaller if twtech wants lower latency)
- Starting position:
- Latest: only new records going forward
(recommended for most use cases)
- Trim Horizon: from the oldest available
record
- Enable trigger:
Yes (check the box)
- Click “Add”
Step
3: Write the Lambda Handler Code (Example)
Here's twtech sample Python function to
decode and log records:
# python
import base64
def
lambda_handler(event, context):
for record in event['Records']:
payload = base64.b64decode(record['kinesis']['data'])
print("Decoded record:",
payload.decode('utf-8'))
return {'statusCode': 200}
Click Deploy to save the
function.
Step
4: Test It Out
Send a test record to twtech Kinesis
stream:
# bash
aws
kinesis put-record \
--stream-name twtech-kds-stream \
--partition-key test-key \
--data "Hello twtech team from Lambda
consumer!"
Then check CloudWatch Logs:
- Go to CloudWatch Logs Console
- Look under /aws/lambda/KinesisStreamConsumer (or your function name)
- twtech should see the decoded payload
IAM Role Permissions for Lambda
The Lambda execution role should
have this policy:
# json
{
"Effect": "Allow",
"Action": [
"kinesis:GetRecords",
"kinesis:GetShardIterator",
"kinesis:DescribeStream",
"kinesis:ListStreams",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
Tips & Gotchas
✅
Tip |
⚠️
Gotcha(watchout) |
Use batch window to delay
and group records (up to 5 mins). |
Lambda timeout should be > batch window or record batch size. |
Use environment variables
for stream names or custom params. |
Avoid long processing; Lambda has
a 15-min max duration. |
Use destination errors (DLQs)
to catch failed records. |
No retries after 7 days (data
expires in Kinesis by then). |
Walk through a hands-on
lab to connect Kinesis Data Streams (KDS) with Kinesis Data
Firehose, AWS Lambda, and Amazon S3 — a classic real-time
streaming pipeline.
Goal
Set up KDS → Lambda (process data)
→ Firehose → S3
twtech will simulate real-time event publishing and storage
Architecture Overview
# scss
[Producer
App / CLI]
│
▼
[Kinesis
Data Stream]
│
├────▶ [Lambda
Function] — (transforms / logs data)
│
▼
[Kinesis
Data Firehose]
│
▼
[twtech-S3Bucket]
Step-by-Step Hands-On Lab
1.
Create an S3 Bucket (for Firehose destination)
# bash
aws s3 mb s3://twtech-kds-firehose-bucket
2.
Create a Kinesis Data Stream
# bash
aws
kinesis create-stream \
--stream-name twtech-kds-stream \
--shard-count 1
Wait until the stream becomes ACTIVE
3. Create a Lambda Function to Consume KDS
Sample Lambda function (Python)
# python
import
base64
import json
def
lambda_handler(event, context):
for record in event['Records']:
payload = base64.b64decode(record["kinesis"]["data"])
print("Decoded payload:",
payload.decode('utf-8'))
return {"statusCode": 200}
Deploy Lambda:
- Go to AWS Console → Lambda → Create Function
- Choose Python 3.x runtime
- Paste the code above
- Save and deploy
Add Kinesis Trigger:
- Configuration → Triggers → Add trigger → Kinesis →
Select twtech-kds-stream
Set batch size (e.g., 100), enable trigger
4. Create Firehose Delivery Stream (from KDS
to S3)
# bash
aws
firehose create-delivery-stream \
--delivery-stream-name twtech-kds-firehose \
--delivery-stream-type KinesisStreamAsSource
\
--kinesis-stream-source-configuration
RoleARN=<twtech-IAMRoleARN> StreamARN=<twtech-StreamARN> \
--extended-s3-destination-configuration
file://s3-config.json
twtech would need:
- StreamARN
→ get via aws kinesis describe-stream
- IAMRoleARN
→ role with permission to access KDS + S3
s3-config.json example:
# json
{
"RoleARN": "arn:aws:iam::123456789xxx:role/twtech-firehose-role",
"BucketARN": "arn:aws:s3:::twtech-kds-firehose-bucket",
"Prefix": "firehose-data/",
"BufferingHints": {
"SizeInMBs": 1,
"IntervalInSeconds": 60
},
"CompressionFormat": "UNCOMPRESSED"
}
5. Send Test Data to KDS
# bash
aws
kinesis put-record \
--stream-name twtech-kds-stream \
--partition-key user1 \
--data "Hello twtech team from
Kinesis!"
twtech should see:
- Lambda logs in CloudWatch
- Processed records land in S3 (after 60s buffer)
Verify
Lambda
- Go to CloudWatch Logs
- Look for decoded payloads
S3
- Go to my-kds-firehose-bucket
- Check under /firehose-data/ for records (txt/json depending on setup)
IAM Permissions (Important!)
Lambda execution role:
# json
{
"Effect": "Allow",
"Action": ["logs:*", "kinesis:*"],
"Resource": "*"
}
Firehose role:
# json
{
"Effect": "Allow",
"Action": ["kinesis:*", "s3:*",
"logs:*"],
"Resource": "*"
}
twtech-Bonus Ideas:
- Use Lambda to transform the payload before
sending it to Firehose
- Add Kinesis Data Analytics for real-time SQL
queries
Use Enhanced Fan-Out for parallel Lambda consumers.
No comments:
Post a Comment