Tuesday, July 22, 2025

Amazon Kinesis Data Streams.

Amazon Kinesis Data Streams (KDS) is a fully managed, real-time data streaming service by AWS that allows twtech to collect, process, and analyze streaming data at scale.

It's ideal for use cases like:

Log processing,
Event data collection,
Real-time analytics,
Machine learning,
Anomaly detection.

Key Concepts

Term	Description
Stream	A logical grouping of data records. Each stream can have multiple shards.
Shard	A unit of throughput in KDS. Each shard supports up to 1 MB/sec write and 2 MB/sec read.
Data Record	The basic unit of data in KDS. Contains a sequence number, partition key, and blob of data.
Partition Key	A string used to group data within shards. Determines how records are distributed.
Sequence Number	A unique identifier for each record within a shard.
Retention Period	The time that data is stored in a stream (default: 24 hours; up to 365 days with extended retention).

How Kinesis Data Streams Works

Producers (apps, AWS services, IoT devices, etc.) send data to KDS.
Data is distributed across shards in the stream using partition keys.
Consumers (e.g., Lambda, Kinesis Client Library [KCL], custom apps) read and process the data.

Producer Options

AWS SDKs (PutRecord / PutRecords API)
AWS IoT
Amazon Kinesis Agent
Amazon CloudWatch Logs / Firehose / IoT Core

Consumer Options

Kinesis Data Analytics – SQL for real-time analytics.
AWS Lambda – Serverless processing (via event source mapping).
Custom Consumers – Using KCL or AWS SDK.
Enhanced Fan-Out (EFO) – 2 MB/sec per consumer, with dedicated throughput.

Security Features

Encryption at Rest – KMS integration.
TLS in Transit
IAM-based Access Control
VPC Endpoints for private connectivity.

Monitoring & Scaling

CloudWatch Metrics: IncomingBytes, GetRecords.IteratorAgeMilliseconds, etc.
Auto-scaling: With AWS Application Auto Scaling or manual shard management.

Common Use Cases

Real-time application log streaming
Clickstream analytics (web/mobile apps)
IoT telemetry data
Fraud detection
Live metrics dashboards

twtech-Insights:

Comparison of Amazon Kinesis Data Streams with firehose and MSK (kafka)

Here's twtech clear and concise comparison of Amazon Kinesis Data Streams (KDS), Kinesis Data Firehose, and Amazon MSK (Managed Streaming for Apache Kafka) across several key areas:

Purpose & Use Case

Feature	Kinesis Data Streams (KDS)	Kinesis Data Firehose	Amazon MSK (Kafka)
Primary Use	Real-time custom app streaming and processing	Data delivery (ETL, buffering, delivery to S3, Redshift, etc.)	Open-source Kafka workloads, real-time pipelines
Ideal For	Real-time event processing, custom app logic	Simplified data ingestion to storage/analytics services	Kafka-compatible apps and ecosystems
Latency	Milliseconds	Near real-time (usually within 60 seconds)	Milliseconds

Management & Complexity

Feature	KDS	Firehose	MSK
Fully Managed	Yes	Yes	Mostly (AWS manages brokers, but you manage clients and topics)
Setup Complexity	Moderate (requires shard planning and consumer management)	Low (easy config, automatic scaling)	High (Kafka expertise required)
Custom Processing	Yes (via Lambda, KCL, etc.)	Limited (can transform with Lambda)	Yes (Kafka Streams, Connect, etc.)

Performance & Scaling

Feature	KDS	Firehose	MSK
Throughput Scaling	Manual (shards) or auto-scaling	Auto-scaling	Partition-based; managed scaling
Max Ingest Rate	~1 MB/sec/shard	~5 GB/hour/stream (scales up)	High, based on Kafka cluster size
Data Retention	24 hrs to 365 days	24 hrs max	Default 7 days (configurable)

Security & Access

Feature	KDS	Firehose	MSK
Encryption	At rest (KMS), in transit (TLS)	At rest (KMS), in transit (TLS)	At rest (KMS), in transit (TLS)
VPC Support	Yes	Yes	Yes
IAM Integration	Yes	Yes	Limited (Kafka uses ACLs, IAM for AWS APIs)

Data Delivery & Destinations

Feature	KDS	Firehose	MSK
Built-in Destinations	None (use consumers)	S3, Redshift, OpenSearch, Splunk, custom HTTP	None (use Kafka Connect or custom consumers)
Transformations	With Lambda (custom)	Inline with Lambda	With Kafka Streams / Connect

Pricing

Feature	KDS	Firehose	MSK
Cost Model	Pay per shard, PUT payloads, and retrievals	Pay per data ingested and delivered	Pay per broker instance + storage + data transfer
Cheapest Option	For lightweight custom apps	For simple ETL and data lake ingestion	Expensive (enterprise-scale workloads)

twtech Recommendation of When to Use What:

twtech uses KDS if it:

Needs low-latency, real-time custom processing
Wants tight control over producers/consumers
Doesn’t need Kafka ecosystem support

twtech uses Firehose if it :

Wants to easily ingest data into S3, Redshift, etc.
Prefers fully-managed, zero-maintenance pipelines
Can tolerate ~1-minute delay and limited transformation

twtech uses MSK if it:

Has existing Kafka workloads or Kafka skills
Needs advanced streaming patterns or Kafka ecosystem tools (e.g., Kafka Streams, ksqlDB)
Requires high throughput and fine-grained control over topics and partitions.

Project: Hands-on

How twtech creates and use amazon kinesis for its production and consumption.

Search for the aws service: Kinesis

Create data stream: twtech-kinesis-data-streams

Create data stream with name: twtech-kinesis-data-streams

configure: Data stream capacity

Capacity mode

Create data stream: twtech-kinesis-data-streams

Application:

Monitoring Kinesis data streams:

Under configuration, the number of shards can be scaled from 1(default) to double the number like 2 shards

From:

Double the shards: to 2.

To: save changes

How twtech uses the kinesis data stream skd for it: Production and Consumptions databases

Open cloudshell (is a command line interface in aws): To use CLI

Here’s twtech practical cheat sheet of AWS CLI commands for Amazon Kinesis Data Streams, including operations for:

creating,
listing,
reading,
writing,
deleting streams.

1. Create a Stream

# bash

aws kinesis create-stream \

--stream-name twtech-kinesis-stream \

--shard-count 1

Note: Use --stream-mode-details for on-demand mode (see below).

2. Create an On-Demand Stream

# bash

aws kinesis create-stream \

--stream-name twtech-kinesis-stream \

--stream-mode-details StreamMode=ON_DEMAND

3. List Streams

# bash

aws kinesis list-streams

4. Describe a Stream

# bash

aws kinesis describe-stream \

--stream-name twtech-kinesis-stream

5. Put a Record (Write to Stream)

# bash

aws kinesis put-record \

--stream-name twtech-kinesis-stream \

--partition-key user1 \

--data "Hello twtech team from CLI!"

--data must be base64-safe or plaintext.
Use --cli-binary-format raw-in-base64-out if using JSON blobs.

6. Put Multiple Records

# bash

aws kinesis put-records \

--stream-name twtech-kinesis-stream \

--records file://records.json

Example records.json:

# json

[

{

"Data": "SGVsbG8gd29ybGQ=",

"PartitionKey": "twtech-user1"

{

"Data": "VGVzdCBtZXNzYWdl",

"PartitionKey": "twtech-user2"

}

]

7. Get Records from a Shard

Step 1: Get the Shard Iterator

# bash

aws kinesis get-shard-iterator \

--stream-name twtech-kinesis-stream \

--shard-id shardId-000000000000 \

--shard-iterator-type TRIM_HORIZON

Step 2: Use the iterator to fetch records

# bash

aws kinesis get-records \

--shard-iterator <iterator-value> \

--limit 10

8. Delete a Stream

# bash

aws kinesis delete-stream \

--stream-name twtech-kinesis-stream \

Use --enforce-consumer-deletion if consumers are attached.

9. Split or Merge Shards

Split a shard

# bash

aws kinesis split-shard \

--stream-name twtech-kinesis-stream \

--shard-to-split shardId-000000000000 \

--new-starting-hash-key 170141183460469231731687xxxx15884105728

Merge two shards

# bash

aws kinesis merge-shards \

--stream-name twtech-kinesis-stream \

--shard-id shardId-000000000000 \

--adjacent-shard-id shardId-000000000001

10. Enable Enhanced Monitoring

# bash

aws kinesis enable-enhanced-monitoring \

--stream-name twtech-kinesis-stream \

--shard-level-metrics IncomingBytes OutgoingBytes

Quick Tip: Check Stream Status

# bash

aws kinesis describe-stream \

--stream-name twtech-kinesis-stream \

--query "StreamDescription.StreamStatus"

Returns: CREATING, ACTIVE, DELETING, etc.

Bash script to automate Kinesis setup and testing

Here is twtech Bash script that automates the full lifecycle of an Amazon Kinesis Data Stream, including:

Stream creation
Sending test data
Reading from the stream
Clean-up

Bash Script: kinesis-setup-test.sh

# bash

#!/bin/bash

# CONFIGURATION

STREAM_NAME="twtech-kinesis-stream"

SHARD_COUNT=1

PARTITION_KEY="twtech-Key"

REGION="us-east-1"

TEST_DATA="Hello twtech team from Kinesis Bash Script!"

TMP_ITERATOR_FILE="shard_iterator.txt"

TMP_RECORD_FILE="record_output.json"

# CREATE STREAM

echo "Creating Kinesis stream: $twtech-kinesis-stream"

aws kinesis create-stream \

--stream-name "$twtech-kinesis-stream" \

--shard-count "$SHARD_COUNT" \

--region "$us-east-2"

echo "Waiting for stream to become ACTIVE..."

aws kinesis wait stream-exists \

--stream-name "$twtech-kinesis-stream" \

--region "$REGION"

echo "Stream $twtech-kinesis-stream is ACTIVE."

# PUT A RECORD

echo "Putting record into the stream..."

aws kinesis put-record \

--stream-name "$twtech-kinesis-stream" \

--partition-key "$PARTITION_KEY" \

--data "$TEST_DATA" \

--region "$us-east-2"

# GET SHARD ID

SHARD_ID=$(aws kinesis describe-stream \

--stream-name "$twtech-kinesis-stream" \

--region "$us-east-2" \

--query "StreamDescription.Shards[0].ShardId" \

--output text)

echo "Shard ID: $SHARD_ID"

# GET SHARD ITERATOR

SHARD_ITERATOR=$(aws kinesis get-shard-iterator \

--stream-name "$twtech-kinesis-stream" \

--shard-id "$SHARD_ID" \

--shard-iterator-type TRIM_HORIZON \

--region "$us-east-2" \

--query 'ShardIterator' \

--output text)

echo "Got shard iterator."

# GET RECORDS

echo "Waiting a few seconds for data to arrive..."

sleep 5

echo "Reading records from the stream..."

aws kinesis get-records \

--shard-iterator "$SHARD_ITERATOR" \

--limit 10 \

--region "$us-east-2" \

> "$TMP_RECORD_FILE"

cat "$TMP_RECORD_FILE"

# CLEAN-UP

echo "Cleaning up: deleting stream $twtech-kinesis-stream"

aws kinesis delete-stream \

--stream-name "$twtech-kinesis-stream" \

--region "$us-east-2" \

--enforce-consumer-deletion

echo "Done."

# CLEAN TEMP FILES

rm -f "$TMP_ITERATOR_FILE" "$TMP_RECORD_FILE"

How to Run the script:

Save the script as twtech-kinesis-setup-test.sh
Make it executable(grant executable permisions for file):twtech-kinesis-setup-test.sh

# bash

chmod +x kinesis-setup-test.sh

Run it:

# bash

./twtech-kinesis-setup-test.sh

# or simply

sh twtech-kinesis-setup-test.sh

Requirements

AWS CLI installed and configured (aws configure)
IAM credentials with permissions:

kinesis:*

What This Script Does

Creates a stream
Sends a sample record
Reads back the data
Deletes the stream after test
Cleans up temp files

Step1: Verify the version of aws cli in cloudshell:

aws --version

If Version of aws cli presents is: aws-cli/2.27.53

Then, Step2: twtech needs to send a records to created: twtech-kinesis-data-streams.

twtech-user signup command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signup" --cli-binary-format raw-in-base64-out

twtech-user login command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user login" --cli-binary-format raw-in-base64-out

twtech-user signout command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signout" --cli-binary-format raw-in-base64-out

How twtech kinesis data stream metrics is monitored form(console): GUI:

NB:

It takes some couple of minutes for the cloudwatch record to be updated:

How twtech consumes resources from its kinesis data streams:

Describe the stream to know what to consume from it: twtech-kinesis-data-stream

aws kinesis describe-stream --stream-name twtech-kinesis-data-stream

How twtech consumes data from its kinesis data streams: twtech-kinesis-data-streams

# To get a ShardIterator string

aws kinesis get-shard-iterator --stream-name twtech-kinesis-data-stream --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON

How twtech gets records of its sharditeration:

aws kinesis get-records --shard-iterator <iterator-string generation>

aws kinesis get-records --shard-iterator "AAAAAAAAAAHDJ9j/PfGsozwtU6MhxneBZsHt+DwVu8gltmGCWomkDSWdVxT+jKTURYwcEpiMLdP4XXc5t1MxrsTB6DciGvoja6ifanQowuRG37ULlgYILeYODCsJ3sxxxxxtMKxxt8XxFPePb06ANJpIuRbTgetS81YgM0GBgaWlxlwP26hIyqGPcOF81xTvSsT6u+Xvv+Xa4+lWNbHXc3/M1vjtHUCUhKZG4RsLh2MIH4HMHdQVHl7aWrAjHfK25y5TA5ImOzbg="