Tuesday, July 22, 2025

Amazon Kinesis Data Streams.

 

Amazon Kinesis Data Streams (KDS) is a fully managed, real-time data streaming service by AWS that allows twtech to collect, process, and analyze streaming data at scale.

 It's ideal for use cases like:

  • Log processing,
  • Event data collection,
  • Real-time analytics,
  • Machine learning,
  • Anomaly detection.

 Key Concepts

Term

Description

Stream

A logical grouping of data records. Each stream can have multiple shards.

Shard

A unit of throughput in KDS. Each shard supports up to 1 MB/sec write and 2 MB/sec read.

Data Record

The basic unit of data in KDS. Contains a sequence number, partition key, and blob of data.

Partition Key

A string used to group data within shards. Determines how records are distributed.

Sequence Number

A unique identifier for each record within a shard.

Retention Period

The time that data is stored in a stream (default: 24 hours; up to 365 days with extended retention).

 How Kinesis Data Streams Works

  1. Producers (apps, AWS services, IoT devices, etc.) send data to KDS.
  2. Data is distributed across shards in the stream using partition keys.
  3. Consumers (e.g., Lambda, Kinesis Client Library [KCL], custom apps) read and process the data.

 Producer Options

  • AWS SDKs (PutRecord / PutRecords API)
  • AWS IoT
  • Amazon Kinesis Agent
  • Amazon CloudWatch Logs / Firehose / IoT Core

 Consumer Options

  • Kinesis Data Analytics – SQL for real-time analytics.
  • AWS Lambda – Serverless processing (via event source mapping).
  • Custom Consumers – Using KCL or AWS SDK.
  • Enhanced Fan-Out (EFO) – 2 MB/sec per consumer, with dedicated throughput.

 Security Features

  • Encryption at Rest – KMS integration.
  • TLS in Transit
  • IAM-based Access Control
  • VPC Endpoints for private connectivity.

 Monitoring & Scaling

  • CloudWatch Metrics: IncomingBytes, GetRecords.IteratorAgeMilliseconds, etc.
  • Auto-scaling: With AWS Application Auto Scaling or manual shard management.

 Common Use Cases

  • Real-time application log streaming
  • Clickstream analytics (web/mobile apps)
  • IoT telemetry data
  • Fraud detection
  • Live metrics dashboards

twtech-Insights:

Comparison of Amazon Kinesis Data Streams with firehose and MSK (kafka)

Here's twtech clear and concise comparison of Amazon Kinesis Data Streams (KDS), Kinesis Data Firehose, and Amazon MSK (Managed Streaming for Apache Kafka) across several key areas:

 Purpose & Use Case

Feature

Kinesis Data Streams (KDS)

Kinesis Data Firehose

Amazon MSK (Kafka)

Primary Use

Real-time custom app streaming and processing

Data delivery (ETL, buffering, delivery to S3, Redshift, etc.)

Open-source Kafka workloads, real-time pipelines

Ideal For

Real-time event processing, custom app logic

Simplified data ingestion to storage/analytics services

Kafka-compatible apps and ecosystems

Latency

Milliseconds

Near real-time (usually within 60 seconds)

Milliseconds

 Management & Complexity

Feature

KDS

Firehose

MSK

Fully Managed

Yes

Yes

Mostly (AWS manages brokers, but you manage clients and topics)

Setup Complexity

Moderate (requires shard planning and consumer management)

Low (easy config, automatic scaling)

High (Kafka expertise required)

Custom Processing

Yes (via Lambda, KCL, etc.)

Limited (can transform with Lambda)

Yes (Kafka Streams, Connect, etc.)

 Performance & Scaling

Feature

KDS

Firehose

MSK

Throughput Scaling

Manual (shards) or auto-scaling

Auto-scaling

Partition-based; managed scaling

Max Ingest Rate

~1 MB/sec/shard

~5 GB/hour/stream (scales up)

High, based on Kafka cluster size

Data Retention

24 hrs to 365 days

24 hrs max

Default 7 days (configurable)

 Security & Access

Feature

KDS

Firehose

MSK

Encryption

At rest (KMS), in transit (TLS)

At rest (KMS), in transit (TLS)

At rest (KMS), in transit (TLS)

VPC Support

Yes

Yes

Yes

IAM Integration

Yes

Yes

Limited (Kafka uses ACLs, IAM for AWS APIs)

 Data Delivery & Destinations

Feature

KDS

Firehose

MSK

Built-in Destinations

None (use consumers)

S3, Redshift, OpenSearch, Splunk, custom HTTP

None (use Kafka Connect or custom consumers)

Transformations

With Lambda (custom)

Inline with Lambda

With Kafka Streams / Connect

 Pricing

Feature

KDS

Firehose

MSK

Cost Model

Pay per shard, PUT payloads, and retrievals

Pay per data ingested and delivered

Pay per broker instance + storage + data transfer

Cheapest Option

For lightweight custom apps

For simple ETL and data lake ingestion

Expensive (enterprise-scale workloads)

twtech Recommendation of  When to Use What:

  • twtech uses KDS if it:
    • Needs low-latency, real-time custom processing
    • Wants tight control over producers/consumers
    • Doesn’t need Kafka ecosystem support
  • twtech uses Firehose if it :
    • Wants to easily ingest data into S3, Redshift, etc.
    • Prefers fully-managed, zero-maintenance pipelines
    • Can tolerate ~1-minute delay and limited transformation
  • twtech uses MSK if it:
    • Has existing Kafka workloads or Kafka skills
    • Needs advanced streaming patterns or Kafka ecosystem tools (e.g., Kafka Streams, ksqlDB)
    • Requires high throughput and fine-grained control over topics and partitions.

Project: Hands-on

How twtech creates and use amazon kinesis for its production and consumption.

Search for the aws service: Kinesis

Create data stream: twtech-kinesis-data-streams

Create data stream with name: twtech-kinesis-data-streams

configure:  Data stream capacity

Capacity mode

Create data stream: twtech-kinesis-data-streams


Application

Monitoring Kinesis data streams:

Under configuration, the number of shards can be scaled from 1(default) to double the number like 2 shards

From:


Double the shards: to 2.

To: save changes

How twtech uses the kinesis data stream skd for it: Production and Consumptions databases

Open cloudshell (is a command line interface in aws): To use CLI




Here’s twtech practical cheat sheet of AWS CLI commands for Amazon Kinesis Data Streams, including operations for:

  •  creating, 
  • listing, 
  • reading, 
  • writing,
  • deleting streams.

 1. Create a Stream

# bash

aws kinesis create-stream \

  --stream-name twtech-kinesis-stream \

  --shard-count 1

 Note: Use --stream-mode-details for on-demand mode (see below).

 2. Create an On-Demand Stream

# bash

aws kinesis create-stream \

  --stream-name twtech-kinesis-stream \

  --stream-mode-details StreamMode=ON_DEMAND

 3. List Streams

# bash

aws kinesis list-streams

4. Describe a Stream

# bash

aws kinesis describe-stream \

  --stream-name twtech-kinesis-stream

 5. Put a Record (Write to Stream)

# bash

aws kinesis put-record \

  --stream-name twtech-kinesis-stream \

  --partition-key user1 \

  --data "Hello twtech team from CLI!"

  • --data must be base64-safe or plaintext.
  • Use --cli-binary-format raw-in-base64-out if using JSON blobs.

 6. Put Multiple Records

# bash

aws kinesis put-records \

  --stream-name twtech-kinesis-stream \

  --records file://records.json

Example records.json:

# json

[

  {

    "Data": "SGVsbG8gd29ybGQ=",

    "PartitionKey": "twtech-user1"

  },

  {

    "Data": "VGVzdCBtZXNzYWdl",

    "PartitionKey": "twtech-user2"

  }

] 

 7. Get Records from a Shard

Step 1: Get the Shard Iterator

# bash

aws kinesis get-shard-iterator \

  --stream-name twtech-kinesis-stream \

  --shard-id shardId-000000000000 \

  --shard-iterator-type TRIM_HORIZON

Step 2: Use the iterator to fetch records

# bash

aws kinesis get-records \

  --shard-iterator <iterator-value> \

  --limit 10

 8. Delete a Stream

# bash 

aws kinesis delete-stream \

  --stream-name twtech-kinesis-stream \

Use --enforce-consumer-deletion if consumers are attached.

 9. Split or Merge Shards

Split a shard

# bash

aws kinesis split-shard \

  --stream-name twtech-kinesis-stream \

  --shard-to-split shardId-000000000000 \

  --new-starting-hash-key 170141183460469231731687xxxx15884105728

Merge two shards

# bash

aws kinesis merge-shards \

  --stream-name twtech-kinesis-stream \

  --shard-id shardId-000000000000 \

  --adjacent-shard-id shardId-000000000001

 10. Enable Enhanced Monitoring

# bash

aws kinesis enable-enhanced-monitoring \

  --stream-name twtech-kinesis-stream \

  --shard-level-metrics IncomingBytes OutgoingBytes

Quick Tip: Check Stream Status

# bash

aws kinesis describe-stream \

  --stream-name twtech-kinesis-stream \

  --query "StreamDescription.StreamStatus"

Returns: CREATING, ACTIVE, DELETING, etc.

Bash script to automate Kinesis setup and testing

Here is twtech Bash script that automates the full lifecycle of an Amazon Kinesis Data Stream, including:

  • Stream creation
  • Sending test data
  • Reading from the stream
  • Clean-up

 Bash Script: kinesis-setup-test.sh

# bash

#!/bin/bash

#  CONFIGURATION 

STREAM_NAME="twtech-kinesis-stream"

SHARD_COUNT=1

PARTITION_KEY="twtech-Key"

REGION="us-east-1"

TEST_DATA="Hello twtech team from Kinesis Bash Script!"

TMP_ITERATOR_FILE="shard_iterator.txt"

TMP_RECORD_FILE="record_output.json"

#  CREATE STREAM

echo "Creating Kinesis stream: $twtech-kinesis-stream"

aws kinesis create-stream \

  --stream-name "$twtech-kinesis-stream" \

  --shard-count "$SHARD_COUNT" \

  --region "$us-east-2"

echo "Waiting for stream to become ACTIVE..."

aws kinesis wait stream-exists \

  --stream-name "$twtech-kinesis-stream" \

  --region "$REGION"

echo "Stream $twtech-kinesis-stream is ACTIVE."

# PUT A RECORD

echo "Putting record into the stream..."

aws kinesis put-record \

  --stream-name "$twtech-kinesis-stream" \

  --partition-key "$PARTITION_KEY" \

  --data "$TEST_DATA" \

  --region "$us-east-2"

#  GET SHARD ID

SHARD_ID=$(aws kinesis describe-stream \

  --stream-name "$twtech-kinesis-stream" \

  --region "$us-east-2" \

  --query "StreamDescription.Shards[0].ShardId" \

  --output text)

echo "Shard ID: $SHARD_ID"

# GET SHARD ITERATOR 

SHARD_ITERATOR=$(aws kinesis get-shard-iterator \

  --stream-name "$twtech-kinesis-stream" \

  --shard-id "$SHARD_ID" \

  --shard-iterator-type TRIM_HORIZON \

  --region "$us-east-2" \

  --query 'ShardIterator' \

  --output text)

echo "Got shard iterator." 

# GET RECORDS 

echo "Waiting a few seconds for data to arrive..."

sleep 5

echo "Reading records from the stream..."

aws kinesis get-records \

  --shard-iterator "$SHARD_ITERATOR" \

  --limit 10 \

  --region "$us-east-2" \

  > "$TMP_RECORD_FILE"

cat "$TMP_RECORD_FILE"

# CLEAN-UP

echo "Cleaning up: deleting stream $twtech-kinesis-stream"

aws kinesis delete-stream \

  --stream-name "$twtech-kinesis-stream" \

  --region "$us-east-2" \

  --enforce-consumer-deletion

echo "Done." 

# CLEAN TEMP FILES 

rm -f "$TMP_ITERATOR_FILE" "$TMP_RECORD_FILE"

 How to Run the script:

  1. Save the script as twtech-kinesis-setup-test.sh
  2. Make it executable(grant executable permisions for file):twtech-kinesis-setup-test.sh

# bash

 chmod +x kinesis-setup-test.sh

  1. Run it:

# bash

./twtech-kinesis-setup-test.sh


# or  simply


sh twtech-kinesis-setup-test.sh

 Requirements

  • AWS CLI installed and configured (aws configure)
  • IAM credentials with permissions:
    • kinesis:*

 What This Script Does

  • Creates a stream
  • Sends a sample record
  • Reads back the data
  • Deletes the stream after test
  • Cleans up temp files

Step1: Verify the version of aws cli in cloudshell:

aws --version 


If Version of aws cli presents is: aws-cli/2.27.53

Then, Step2: twtech needs to send a records to created: twtech-kinesis-data-streams.

twtech-user signup command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signup" --cli-binary-format raw-in-base64-out

twtech-user login command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user login" --cli-binary-format raw-in-base64-out

twtech-user signout command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signout" --cli-binary-format raw-in-base64-out

How twtech kinesis data stream metrics is monitored form(console):  GUI:

NB:

It takes some couple of minutes for the cloudwatch record to be updated:

How twtech consumes resources from its kinesis data streams:

Describe the stream to know what to consume from it: twtech-kinesis-data-stream

 aws kinesis describe-stream  --stream-name twtech-kinesis-data-stream

How twtech consumes data from its kinesis data streams: twtech-kinesis-data-streams

# To get a ShardIterator string

aws kinesis get-shard-iterator  --stream-name twtech-kinesis-data-stream --shard-id shardId-000000000000  --shard-iterator-type TRIM_HORIZON 

How twtech gets records of its sharditeration:

aws kinesis get-records  --shard-iterator <iterator-string generation> 

aws kinesis get-records  --shard-iterator "AAAAAAAAAAHDJ9j/PfGsozwtU6MhxneBZsHt+DwVu8gltmGCWomkDSWdVxT+jKTURYwcEpiMLdP4XXc5t1MxrsTB6DciGvoja6ifanQowuRG37ULlgYILeYODCsJ3sxxxxxtMKxxt8XxFPePb06ANJpIuRbTgetS81YgM0GBgaWlxlwP26hIyqGPcOF81xTvSsT6u+Xvv+Xa4+lWNbHXc3/M1vjtHUCUhKZG4RsLh2MIH4HMHdQVHl7aWrAjHfK25y5TA5ImOzbg="

How twtech reads the coded data: Google-Search for base64 decode online

Link: https://www.base64decode.org/

Copy and paste the coded data to decode:

From:

To:


No comments:

Post a Comment

Kubernetes Clusters | Upstream Vs Downstream.

  The terms "upstream" and "downstream" in the context of Kubernetes clusters often refer to the direction of code fl...