Tuesday, July 22, 2025

Amazon Kinesis Data Streams | Overview & Hands-On.


Amazon Kinesis Data Streams - Overview & Hands-On.

Scope:

  • Intro,
  • Key Concepts,
  • How Kinesis Data Streams Works,
  • Architecture,
  • Producer Options,
  • Consumer Options,
  • Security Features.
  • Monitoring & Scaling,
  • Common Use Cases
  • Comparison of Amazon Kinesis Data Streams with firehose,
  • Management & Complexity,
  • Performance & Scaling,
  • Data Delivery & Destinations,
  • Pricing,
  • Recommendation of  When to Use Kinesis Data Streams or firehose,
  • AWS CLI commands for  Amazon Kinesis Data Streams, (including operations),
  • Sample Bash script to automates the full lifecycle of an Amazon Kinesis Data Stream,
  • Project: Hands-on.

Intro:

  • Amazon Kinesis Data Streams (KDS) is a fully managed, real-time data streaming service by AWS.
  • Amazon Kinesis Data Streams (KDS) allows twtech to collect, process, and analyze streaming data at scale.

Ideal use cases:

  • Log processing,
  • Event data collection,
  • Real-time analytics,
  • Machine learning,
  • Anomaly detection.

 Key Concepts

Term

Description

Stream

A logical grouping of data records. Each stream can have multiple shards.

Shard

A unit of throughput in KDS. Each shard supports up to 1 MB/sec write and 2 MB/sec read.

Data Record

The basic unit of data in KDS. Contains a sequence number, partition key, and blob of data.

Partition Key

A string used to group data within shards. Determines how records are distributed.

Sequence Number

A unique identifier for each record within a shard.

Retention Period

The time that data is stored in a stream (default: 24 hours; up to 365 days with extended retention).

 How Kinesis Data Streams Works

  1. Producers (apps, AWS services, IoT devices, etc.) send data to KDS.
  2. Data is distributed across shards in the stream using partition keys.
  3. Consumers (e.g., Lambda, Kinesis Client Library [KCL], twtech-custom-apps) read and process the data.
Architecture

 Producer Options

  • AWS SDKs (PutRecord / PutRecords API)
  • AWS IoT
  • Amazon Kinesis Agent
  • Amazon CloudWatch Logs / Firehose / IoT Core

 Consumer Options

  • Kinesis Data Analytics – SQL for real-time analytics.
  • AWS Lambda – Serverless processing (via event source mapping).
  • Custom Consumers – Using KCL or AWS SDK.
  • Enhanced Fan-Out (EFO) – 2 MB/sec per consumer, with dedicated throughput.

 Security Features

  • Encryption at Rest – KMS integration.
  • TLS in Transit
  • IAM-based Access Control
  • VPC Endpoints for private connectivity.

 Monitoring & Scaling

  • CloudWatch Metrics: IncomingBytes, GetRecords.IteratorAgeMilliseconds, etc.
  • Auto-scaling: With AWS Application Auto Scaling or manual shard management.

 Common Use Cases

  • Real-time application log streaming
  • Clickstream analytics (web/mobile apps)
  • IoT telemetry data
  • Fraud detection
  • Live metrics dashboards

twtech-Insights:

Comparison of Amazon Kinesis Data Streams with firehose and MSK (kafka)

 Purpose & Use Case

Feature

Kinesis Data Streams (KDS)

Kinesis Data Firehose


Primary Use

Real-time custom app streaming and processing

Data delivery (ETL, buffering, delivery to S3, Redshift, etc.)


Ideal For

Real-time event processing, custom app logic

Simplified data ingestion to storage/analytics services


Latency

Milliseconds

Near real-time (usually within 60 seconds)


 Management & Complexity

Feature

KDS

Firehose


Fully Managed

Yes

Yes


Setup Complexity

Moderate (requires shard planning and consumer management)

Low (easy config, automatic scaling)


Custom Processing

Yes (via Lambda, KCL, etc.)

Limited (can transform with Lambda)


 Performance & Scaling

Feature

KDS

Firehose


Throughput Scaling

Manual (shards) or auto-scaling

Auto-scaling


Max Ingest Rate

~1 MB/sec/shard

~5 GB/hour/stream (scales up)


Data Retention

24 hrs to 365 days

24 hrs max


 Security & Access

Feature

KDS

Firehose


Encryption

At rest (KMS), in transit (TLS)

At rest (KMS), in transit (TLS)


VPC Support

Yes

Yes


IAM Integration

Yes

Yes


 Data Delivery & Destinations

Feature

KDS

Firehose


Built-in Destinations

None (use consumers)

S3, Redshift, OpenSearch, Splunk, custom HTTP


Transformations

With Lambda (custom)

Inline with Lambda

 Pricing

Feature

KDS

Firehose


Cost Model

Pay per shard, PUT payloads, and retrievals

Pay per data ingested and delivered


Cheapest Option

For lightweight custom apps

For simple ETL and data lake ingestion


twtech Recommendation of  When to Use What:

  •  uses KDS if it:
    • Needs low-latency, real-time custom processing
    • Wants tight control over producers/consumers
    • Doesn’t need Kafka ecosystem support
  •  uses Firehose if it :
    • Wants to easily ingest data into S3, Redshift, etc.
    • Prefers fully-managed, zero-maintenance pipelines
    • Can tolerate ~1-minute delay and limited transformation


Project: Hands-on

  • How twtech creates and use amazon kinesis for its production and consumption.
  • Search for the AWS service: Kinesis

  • Create data stream: twtech-kinesis-data-streams

  • Create data stream with name: twtech-kinesis-data-streams

Configure:  Data stream capacity

  • Capacity mode

  • Create data stream: twtech-kinesis-data-streams


  • Application

  • Monitoring Kinesis data streams:

  • Under configuration, 
  • Set the number of shards that can be scaled from 1(default) to double the number like 2 shards

From:


Double the shards: to 2.

To: save changes

  • How twtech uses the kinesis data stream for : Production and Consumptions databases
  • Open cloudshell (is a command line interface in aws): To use CLI

AWS CLI commands for  Amazon Kinesis Data Streams, (including operations)

  • Creating, 
  • Listing, 
  • Reading, 
  • Writing,
  • Deleting streams.

 1. Create a Stream

# bash

aws kinesis create-stream \

  --stream-name twtech-kinesis-stream \

  --shard-count 1

 NB:

  •  Use --stream-mode-details for on-demand mode (see below).

 2. Create an On-Demand Stream

# bash

aws kinesis create-stream \

  --stream-name twtech-kinesis-stream \

  --stream-mode-details StreamMode=ON_DEMAND

 3. List Streams

# bash

aws kinesis list-streams

4. Describe a Stream

# bash

aws kinesis describe-stream \

  --stream-name twtech-kinesis-stream

 5. Put a Record (Write to Stream)

# bash

aws kinesis put-record \

  --stream-name twtech-kinesis-stream \

  --partition-key twtechsser1 \

  --data "Hello from twtech Kinesis Data Stream Team via CLI"


NB:
  • --data must be base64-safe or plaintext.
  • Use --cli-binary-format raw-in-base64-out if using JSON blobs.

 6. Put Multiple Records

# bash

aws kinesis put-records \

  --stream-name twtech-kinesis-stream \

  --records file://records.json

Sample records.json:

# json

[

  {

    "Data": "SGVsbG8gd29ybGQ=",

    "PartitionKey": "twtech-user1"

  },

  {

    "Data": "VGVzdCBtZXNzYWdl",

    "PartitionKey": "twtech-user2"

  }

] 

 7. Get Records from a Shard

Step 1: Get the Shard Iterator

# bash

aws kinesis get-shard-iterator \

  --stream-name twtech-kinesis-stream \

  --shard-id shardId-000000000000 \

  --shard-iterator-type TRIM_HORIZON

Step 2: Use the iterator to fetch records

# bash

aws kinesis get-records \

  --shard-iterator <iterator-value> \

  --limit 10

 8. Delete a Stream

# bash 

aws kinesis delete-stream \

  --stream-name twtech-kinesis-stream \

NB:

Use --enforce-consumer-deletion if consumers are attached.

 9. Split or Merge Shards

Split a shard

# bash

aws kinesis split-shard \

  --stream-name twtech-kinesis-stream \

  --shard-to-split shardId-000000000000 \

  --new-starting-hash-key 170141183460469231731687xxxx15884105728

Merge two shards

# bash

aws kinesis merge-shards \

  --stream-name twtech-kds-merge-shards \

  --shard-id shardId-000000000000 \

  --adjacent-shard-id shardId-000000000001

 10. Enable Enhanced Monitoring

# bash

aws kinesis enable-enhanced-monitoring \

  --stream-name twtech-kinesis-stream \

  --shard-level-metrics IncomingBytes OutgoingBytes

11.  Check Stream Status

# bash

aws kinesis describe-stream \

  --stream-name twtech-kinesis-stream \

  --query "StreamDescription.StreamStatus"


twtech Sample Bash script to automates the full lifecycle of an Amazon Kinesis Data Stream

Including:

  • Stream creation
  • Sending test data
  • Reading from the stream
  • Clean-up

 # Bash Script: kinesis-setup-test.sh

#!/bin/bash

#  CONFIGURATION 

STREAM_NAME="twtech-kinesis-stream"

SHARD_COUNT=1

PARTITION_KEY="twtech-Key"

REGION="us-east-2"

TEST_DATA="Hello from twtech Kinesis Data Streams via Bash Script"

TMP_ITERATOR_FILE="shard_iterator.txt"

TMP_RECORD_FILE="record_output.json"

#  CREATE STREAM

echo "Creating Kinesis stream: $twtech-kinesis-stream"

aws kinesis create-stream \

  --stream-name "$twtech-kinesis-stream" \

  --shard-count "$SHARD_COUNT" \

  --region "$us-east-2"

echo "Waiting for twtech kinesis data stream to become ACTIVE..."

aws kinesis wait stream-exists \

  --stream-name "$twtech-kinesis-stream" \

  --region "$REGION"

echo "Stream $twtech-kinesis-stream is now  ACTIVE."

# PUT A RECORD

echo "Putting record into the stream..."

aws kinesis put-record \

  --stream-name "$twtech-kinesis-stream" \

  --partition-key "$PARTITION_KEY" \

  --data "$TEST_DATA" \

  --region "$us-east-2"

#  GET SHARD ID

SHARD_ID=$(aws kinesis describe-stream \

  --stream-name "$twtech-kinesis-stream" \

  --region "$us-east-2" \

  --query "StreamDescription.Shards[0].ShardId" \

  --output text)

echo "Shard ID: $SHARD_ID"

# GET SHARD ITERATOR 

SHARD_ITERATOR=$(aws kinesis get-shard-iterator \

  --stream-name "$twtech-kinesis-stream" \

  --shard-id "$SHARD_ID" \

  --shard-iterator-type TRIM_HORIZON \

  --region "$us-east-2" \

  --query 'ShardIterator' \

  --output text)

echo "Got shard iterator." 

# GET RECORDS 

echo "Waiting a few seconds for data to arrive..."

sleep 5

echo "Reading records from the stream..."

aws kinesis get-records \

  --shard-iterator "$SHARD_ITERATOR" \

  --limit 10 \

  --region "$us-east-2" \

  > "$TMP_RECORD_FILE"

cat "$TMP_RECORD_FILE"

# CLEAN-UP

echo "Cleaning up: deleting stream $twtech-kinesis-stream"

aws kinesis delete-stream \

  --stream-name "$twtech-kinesis-stream" \

  --region "$us-east-2" \

  --enforce-consumer-deletion

echo "Done." 

# CLEAN TEMP FILES 

rm -f "$TMP_ITERATOR_FILE" "$TMP_RECORD_FILE"

 How to Run the script:

  1. Save the script as twtech-kinesis-setup-test.sh
  2. Make the script executable (grant executable permisions for file):twtech-kinesis-setup-test.sh
# Add executive permission to script

# bash

 chmod +x kinesis-setup-test.sh

  1. Run it:

# bash

./twtech-kinesis-setup-test.sh


# or  simply


sh twtech-kinesis-setup-test.sh

 Requirements

  • AWS CLI installed and configured (aws configure)
  • IAM credentials with permissions:
    • kinesis:*

 What The above Script Does

  • Creates a stream
  • Sends a sample record
  • Reads back the data
  • Deletes the stream after test
  • Cleans up temp files

Step1: Verify the version of aws cli in cloudshell:

aws --version 


  • If Version of aws cli presents is: aws-cli/2.27.53
  • Then, Step2: twtech needs to send a records to created: twtech-kinesis-data-streams.
  • twtech-user signup command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signup" --cli-binary-format raw-in-base64-out

  • twtech-user login command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user login" --cli-binary-format raw-in-base64-out

  • twtech-user signout command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signout" --cli-binary-format raw-in-base64-out

  • How twtech kinesis data stream metrics is monitored form (console): UI:
NB:
  • It takes some couple of minutes for the cloudwatch record to be updated:

  • How twtech consumes resources from its kinesis data streams:
  • How twtech Describes the stream to know what to consume from it: twtech-kinesis-data-stream

 aws kinesis describe-stream  --stream-name twtech-kinesis-data-stream

  • How twtech consumes data from its kinesis data streams: twtech-kinesis-data-streams
  •  How twtech gets a ShardIterator string

aws kinesis get-shard-iterator  --stream-name twtech-kinesis-data-stream --shard-id shardId-000000000000  --shard-iterator-type TRIM_HORIZON 

  • How twtech gets records of its sharditeration:
  • aws kinesis get-records  --shard-iterator <iterator-string generation> 

aws kinesis get-records  --shard-iterator "AAAAAAAAAAHDJ9j/PfGsozwtU6MhxneBZsHt+DwVu8gltmGCWomkDSWdVxT+jKTURYwcEpiMLdP4XXc5t1MxrsTB6DciGvoja6ifanQowuRG37ULlgYILeYODCsJ3sxxxxxtMKxxt8XxFPePb06ANJpIuRbTgetS81YgM0GBgaWlxlwP26hIyqGPcOF81xTvSsT6u+Xvv+Xa4+lWNbHXc3/M1vjtHUCUhKZG4RsLh2MIH4HMHdQVHl7aWrAjHfK25y5TA5ImOzbg="

  • How twtech reads the coded data in:Base64 
  • Google-Search for base64 decode online

Link: 

https://www.base64decode.org/

  • Copy and paste the coded data to decode

From:

To:





No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...