Think - with -Tech: Amazon Kinesis Data Streams

Tuesday, July 22, 2025

Amazon Kinesis Data Streams | Focused Overview.

Here’s twtech focused overview of Amazon Kinesis Data Streams (KDS) helps twtech to understand its architecture, use cases, performance characteristics, and how to use it effectively.

The concept: Kinesis Data Streams

Amazon Kinesis Data Streams (KDS) is a real-time data streaming service that lets you ingest, buffer, and process large volumes of data continuously in real time.

Use it to capture things like:

Clickstream data
Log files
IoT telemetry
Social media feeds
Application events

Core Components to keep in mind

Component	Description
Stream	The core object in KDS; holds a series of shards.
Shard	A unit of capacity: 1 MB/sec write and 2 MB/sec read throughput.
Data Record	The individual unit of data (up to 1 MB), includes a partition key, data blob, and sequence number.
Partition Key	Determines how records are distributed across shards.
Retention Period	Default 24 hours (can go up to 365 days).

How Data Flows in KDS

Producers (apps, IoT devices, AWS services) send records to a stream.
KDS stores records in shards in ordered sequences.
Consumers (like Lambda, KCL apps, or custom clients) read and process the data.

Producer Options

AWS SDK (PutRecord / PutRecords)
AWS IoT Core
Amazon CloudWatch Logs subscriptions
AWS Kinesis Agent (for log files)

Consumer Options

Type	Description
Kinesis Client Library (KCL)	Java-based library to build robust consumers. Handles load balancing and checkpointing.
AWS Lambda	Easiest option. Automatically scales, processes batches from the stream.
Enhanced Fan-Out	Each consumer gets a dedicated 2 MB/sec pipe per shard — ideal for low-latency parallel consumers.
GetRecords / GetShardIterator	Low-level APIs for custom consumers.

Configuration & Scaling

Manual Scaling: Add/remove shards.
Auto Scaling: Enable Kinesis scaling with CloudWatch and Application Auto Scaling.
Provisioned Throughput: Based on shard count (1 MB/sec in, 2 MB/sec out per shard).
Limits: Each record = max 1 MB; max 1,000 PUT records per second per shard.

Security

IAM policies for access control
Encryption at rest with AWS KMS
TLS encryption in transit
VPC endpoints for private access

Monitoring

twtech uses CloudWatch to track changes: Monitoring & Observability.

IncomingBytes, IncomingRecords
GetRecords.IteratorAgeMilliseconds
ReadProvisionedThroughputExceeded
WriteProvisionedThroughputExceeded

Common Use Cases

Use Case	Description
Real-time analytics	Collect clickstream or game telemetry for live dashboards.
Log processing	Ingest app/server logs to process or archive.
IoT data	Collect sensor data for real-time processing.
Data lake ingestion	Stream into S3 (via Firehose or custom consumers).
Machine learning	Feed real-time features into ML models.

Compared to Other Services

Feature	KDS	Kinesis Firehose	Amazon MSK
Purpose	Real-time stream processing	Data delivery	Kafka-compatible event streaming
Control	High	Minimal	High (Kafka config)
Latency	Low (ms)	~1 min	Low
Use Case	Real-time apps	Simple delivery pipelines	Kafka-based systems

Getting Started

# bash

# Example: Using AWS CLI to put a record

aws kinesis put-record \

--stream-name twtech-data-stream \

--partition-key user1 \

--data "Hello twtech team from Kinesis!"

Or configure AWS Lambda as a consumer directly from the console and link it to kinesis data stream.

Here's twtech step-by-step guide to configure AWS Lambda as a consumer of a Kinesis Data Stream using the AWS Console — no code needed initially.

This is one of the easiest and most powerful ways to process streaming data in real time.

Prerequisites

A Kinesis Data Stream (e.g., twtech-kds-stream)
A Lambda function (or twtech creates one)
Appropriate IAM permissions (Lambda needs to read from Kinesis and write logs to CloudWatch)

Step-by-Step: Link Lambda to Kinesis from Console

Step 1: Go to Lambda Console

Open the AWS Lambda Console
Click “Create function”

Choose Author from scratch
Give it a name (e.g., KinesisStreamConsumer)
Choose a runtime (e.g., Python 3.x or Node.js)
Create a new execution role or use an existing one (see IAM tip below)
Click “Create function”

Step 2: Add Kinesis as a Trigger

After the function is created, scroll to the Function overview
Click “+ Add trigger”
In the “Select a trigger” dropdown, choose Kinesis
Configure:

Kinesis stream: Choose twtech stream (e.g., twtech-kds-stream)
Batch size: 100 (default; smaller if twtech wants lower latency)
Starting position:

Latest: only new records going forward (recommended for most use cases)
Trim Horizon: from the oldest available record

Enable trigger: Yes (check the box)

Click “Add”

Step 3: Write the Lambda Handler Code (Example)

Here's twtech sample Python function to decode and log records:

# python

import base64

def lambda_handler(event, context):

for record in event['Records']:

payload = base64.b64decode(record['kinesis']['data'])

print("Decoded record:", payload.decode('utf-8'))

return {'statusCode': 200}

Click Deploy to save the function.

Step 4: Test It Out

Send a test record to twtech Kinesis stream:

# bash

aws kinesis put-record \

--stream-name twtech-kds-stream \

--partition-key test-key \

--data "Hello twtech team from Lambda consumer!"

Then check CloudWatch Logs:

Go to CloudWatch Logs Console
Look under /aws/lambda/KinesisStreamConsumer (or your function name)
twtech should see the decoded payload

IAM Role Permissions for Lambda

The Lambda execution role should have this policy:

# json

{

"Effect": "Allow",

"Action": [

"kinesis:GetRecords",

"kinesis:GetShardIterator",

"kinesis:DescribeStream",

"kinesis:ListStreams",

"logs:CreateLogGroup",

"logs:CreateLogStream",

"logs:PutLogEvents"

"Resource": "*"

}

Tips & Gotchas

✅ Tip	⚠️ Gotcha(watchout)
Use batch window to delay and group records (up to 5 mins).	Lambda timeout should be > batch window or record batch size.
Use environment variables for stream names or custom params.	Avoid long processing; Lambda has a 15-min max duration.
Use destination errors (DLQs) to catch failed records.	No retries after 7 days (data expires in Kinesis by then).

Walk through a hands-on lab to connect Kinesis Data Streams (KDS) with Kinesis Data Firehose, AWS Lambda, and Amazon S3 — a classic real-time streaming pipeline.

Goal

Set up KDS → Lambda (process data) → Firehose → S3
twtech will simulate real-time event publishing and storage

Architecture Overview

# scss

[Producer App / CLI]

│

▼

[Kinesis Data Stream]

│

├────▶ [Lambda Function] — (transforms / logs data)

│

▼

[Kinesis Data Firehose]

│

▼

[twtech-S3Bucket]

Step-by-Step Hands-On Lab

1. Create an S3 Bucket (for Firehose destination)

# bash

aws s3 mb s3://twtech-kds-firehose-bucket

2. Create a Kinesis Data Stream

# bash

aws kinesis create-stream \

--stream-name twtech-kds-stream \

--shard-count 1

Wait until the stream becomes ACTIVE

3. Create a Lambda Function to Consume KDS

Sample Lambda function (Python)

# python

import base64

import json

def lambda_handler(event, context):

for record in event['Records']:

payload = base64.b64decode(record["kinesis"]["data"])

print("Decoded payload:", payload.decode('utf-8'))

return {"statusCode": 200}

Deploy Lambda:

Go to AWS Console → Lambda → Create Function
Choose Python 3.x runtime
Paste the code above
Save and deploy

Add Kinesis Trigger:

Configuration → Triggers → Add trigger → Kinesis → Select twtech-kds-stream

Set batch size (e.g., 100), enable trigger

4. Create Firehose Delivery Stream (from KDS to S3)

# bash

aws firehose create-delivery-stream \

--delivery-stream-name twtech-kds-firehose \

--delivery-stream-type KinesisStreamAsSource \

--kinesis-stream-source-configuration RoleARN=<twtech-IAMRoleARN> StreamARN=<twtech-StreamARN> \

--extended-s3-destination-configuration file://s3-config.json

twtech would need:

StreamARN → get via aws kinesis describe-stream
IAMRoleARN → role with permission to access KDS + S3

s3-config.json example:

# json

{

"RoleARN": "arn:aws:iam::123456789xxx:role/twtech-firehose-role",

"BucketARN": "arn:aws:s3:::twtech-kds-firehose-bucket",

"Prefix": "firehose-data/",

"BufferingHints": {

"SizeInMBs": 1,

"IntervalInSeconds": 60

"CompressionFormat": "UNCOMPRESSED"

}

5. Send Test Data to KDS

# bash

aws kinesis put-record \

--stream-name twtech-kds-stream \

--partition-key user1 \

--data "Hello twtech team from Kinesis!"

twtech should see:

Lambda logs in CloudWatch
Processed records land in S3 (after 60s buffer)

Verify

Lambda

Go to CloudWatch Logs
Look for decoded payloads

Go to my-kds-firehose-bucket
Check under /firehose-data/ for records (txt/json depending on setup)

IAM Permissions (Important!)

Lambda execution role:

# json

{

"Effect": "Allow",

"Action": ["logs:*", "kinesis:*"],

"Resource": "*"

}

Firehose role:

# json

{

"Effect": "Allow",

"Action": ["kinesis:*", "s3:*", "logs:*"],

"Resource": "*"

}

twtech-Bonus Ideas:

Use Lambda to transform the payload before sending it to Firehose
Add Kinesis Data Analytics for real-time SQL queries

Use Enhanced Fan-Out for parallel Lambda consumers.

Think - with -Tech

Tuesday, July 22, 2025

Amazon Kinesis Data Streams | Focused Overview.

No comments:

Post a Comment

Kubernetes Clusters | Upstream Vs Downstream.

Blog Archive