Amazon Kinesis Data Streams (KDS) is a fully
managed, real-time data streaming service by AWS that allows twtech to collect, process, and analyze streaming data at
scale.
It's ideal for use cases like:
- Log processing,
- Event data collection,
- Real-time analytics,
- Machine learning,
- Anomaly detection.
Key Concepts
Term |
Description |
Stream |
A logical grouping of data records.
Each stream can have multiple shards. |
Shard |
A unit of throughput in KDS. Each
shard supports up to 1 MB/sec write and 2 MB/sec read. |
Data Record |
The basic unit of data in KDS.
Contains a sequence number, partition key, and blob of data. |
Partition Key |
A string used to group data within
shards. Determines how records are distributed. |
Sequence Number |
A unique identifier for each record
within a shard. |
Retention Period |
The time that data is stored in a
stream (default: 24 hours; up to 365 days with extended retention). |
How Kinesis Data Streams Works
- Producers (apps, AWS services, IoT devices, etc.)
send data to KDS.
- Data is distributed across shards in the stream using partition keys.
- Consumers (e.g., Lambda, Kinesis Client Library [KCL], custom apps) read and process the data.
Producer Options
- AWS SDKs (PutRecord / PutRecords API)
- AWS IoT
- Amazon Kinesis Agent
- Amazon CloudWatch Logs / Firehose / IoT
Core
Consumer Options
- Kinesis Data Analytics – SQL for real-time analytics.
- AWS Lambda – Serverless processing (via event
source mapping).
- Custom Consumers – Using KCL or AWS SDK.
- Enhanced Fan-Out (EFO) – 2 MB/sec per consumer, with dedicated
throughput.
Security Features
- Encryption at Rest – KMS integration.
- TLS in Transit
- IAM-based Access Control
- VPC Endpoints for private connectivity.
Monitoring & Scaling
- CloudWatch Metrics: IncomingBytes,
GetRecords.IteratorAgeMilliseconds, etc.
- Auto-scaling: With AWS Application Auto Scaling or
manual shard management.
Common Use Cases
- Real-time application log streaming
- Clickstream analytics (web/mobile apps)
- IoT telemetry data
- Fraud detection
- Live metrics dashboards
twtech-Insights:
Comparison of Amazon Kinesis Data Streams with firehose
and MSK (kafka)
Here's twtech clear and concise comparison of Amazon Kinesis Data Streams (KDS), Kinesis Data
Firehose, and Amazon MSK (Managed Streaming for Apache Kafka)
across several key areas:
Purpose & Use Case
Feature |
Kinesis
Data Streams (KDS) |
Kinesis
Data Firehose |
Amazon
MSK (Kafka) |
Primary Use |
Real-time custom app streaming and
processing |
Data delivery (ETL, buffering,
delivery to S3, Redshift, etc.) |
Open-source Kafka workloads,
real-time pipelines |
Ideal For |
Real-time event processing, custom
app logic |
Simplified data ingestion to
storage/analytics services |
Kafka-compatible apps and ecosystems |
Latency |
Milliseconds |
Near real-time (usually within 60
seconds) |
Milliseconds |
Management &
Complexity
Feature |
KDS |
Firehose |
MSK |
Fully Managed |
Yes |
Yes |
Mostly (AWS manages brokers, but you
manage clients and topics) |
Setup Complexity |
Moderate (requires shard planning
and consumer management) |
Low (easy config, automatic scaling) |
High (Kafka expertise required) |
Custom Processing |
Yes (via Lambda, KCL, etc.) |
Limited (can transform with Lambda) |
Yes (Kafka Streams, Connect, etc.) |
Performance &
Scaling
Feature |
KDS |
Firehose |
MSK |
Throughput Scaling |
Manual (shards) or auto-scaling |
Auto-scaling |
Partition-based; managed scaling |
Max Ingest Rate |
~1 MB/sec/shard |
~5 GB/hour/stream (scales up) |
High, based on Kafka cluster size |
Data Retention |
24 hrs to 365 days |
24 hrs max |
Default 7 days (configurable) |
Security & Access
Feature |
KDS |
Firehose |
MSK |
Encryption |
At rest (KMS), in transit (TLS) |
At rest (KMS), in transit (TLS) |
At rest (KMS), in transit (TLS) |
VPC Support |
Yes |
Yes |
Yes |
IAM Integration |
Yes |
Yes |
Limited (Kafka uses ACLs, IAM for
AWS APIs) |
Data Delivery &
Destinations
Feature |
KDS |
Firehose |
MSK |
Built-in Destinations |
None (use consumers) |
S3, Redshift, OpenSearch, Splunk,
custom HTTP |
None (use Kafka Connect or custom
consumers) |
Transformations |
With Lambda (custom) |
Inline with Lambda |
With Kafka Streams / Connect |
Pricing
Feature |
KDS |
Firehose |
MSK |
Cost Model |
Pay per shard, PUT payloads, and
retrievals |
Pay per data ingested and delivered |
Pay per broker instance + storage +
data transfer |
Cheapest Option |
For lightweight custom apps |
For simple ETL and data lake
ingestion |
Expensive (enterprise-scale
workloads) |
twtech Recommendation
of When to Use What:
- twtech uses KDS if it:
- Needs low-latency, real-time custom
processing
- Wants tight control over
producers/consumers
- Doesn’t need Kafka ecosystem support
- twtech uses Firehose if it :
- Wants to easily ingest data into S3,
Redshift, etc.
- Prefers fully-managed,
zero-maintenance pipelines
- Can tolerate ~1-minute delay and limited
transformation
- twtech uses MSK if it:
- Has existing Kafka workloads or Kafka
skills
- Needs advanced streaming patterns
or Kafka ecosystem tools (e.g., Kafka Streams, ksqlDB)
- Requires high throughput and fine-grained control over topics and partitions.
Project: Hands-on
How twtech creates and use amazon kinesis for its production and consumption.
Search for the aws service: Kinesis
Create data stream: twtech-kinesis-data-streams
Application:
Monitoring Kinesis data streams:
Under configuration, the number of shards can be scaled
from 1(default) to double the number like 2 shards
From:
Double the shards: to 2.
- creating,
- listing,
- reading,
- writing,
- deleting streams.
1. Create a Stream
# bash
aws kinesis create-stream \
--stream-name twtech-kinesis-stream \
--shard-count 1
Note: Use --stream-mode-details for on-demand
mode (see below).
2. Create an On-Demand Stream
# bash
aws kinesis create-stream \
--stream-name twtech-kinesis-stream \
--stream-mode-details StreamMode=ON_DEMAND
3. List Streams
# bash
aws kinesis list-streams
4. Describe a
Stream
# bash
aws kinesis describe-stream \
--stream-name
5. Put a Record (Write to Stream)
# bash
aws kinesis put-record \
--stream-name twtech-kinesis-stream \
--partition-key user1 \
--data
"Hello twtech team from CLI!"
- --data must be base64-safe or
plaintext.
- Use --cli-binary-format raw-in-base64-out
if using JSON blobs.
6. Put Multiple Records
# bash
aws kinesis put-records \
--stream-name twtech-kinesis-stream \
--records file://records.json
Example records.json:
# json
[
{
"Data": "SGVsbG8gd29ybGQ=",
"PartitionKey": "twtech-user1"
},
{
"Data": "VGVzdCBtZXNzYWdl",
"PartitionKey": "twtech-user2"
}
]
7. Get Records from a Shard
Step 1: Get the
Shard Iterator
# bash
aws kinesis get-shard-iterator \
--stream-name twtech-kinesis-stream \
--shard-id shardId-000000000000 \
--shard-iterator-type TRIM_HORIZON
Step 2: Use the
iterator to fetch records
# bash
aws kinesis get-records \
--shard-iterator <iterator-value> \
--limit 10
8. Delete a Stream
# bash
aws kinesis delete-stream \
--stream-name
Use --enforce-consumer-deletion if
consumers are attached.
9. Split or Merge Shards
Split a shard
# bash
aws kinesis split-shard \
--stream-name twtech-kinesis-stream \
--shard-to-split shardId-000000000000 \
--new-starting-hash-key 170141183460469231731687xxxx15884105728
Merge two shards
# bash
aws kinesis merge-shards \
--stream-name twtech-kinesis-stream \
--shard-id shardId-000000000000 \
--adjacent-shard-id shardId-000000000001
10. Enable Enhanced Monitoring
# bash
aws kinesis enable-enhanced-monitoring \
--stream-name twtech-kinesis-stream \
--shard-level-metrics
IncomingBytes OutgoingBytes
Quick Tip: Check
Stream Status
# bash
aws kinesis describe-stream \
--stream-name twtech-kinesis-stream \
--query "StreamDescription.StreamStatus"
Returns: CREATING, ACTIVE, DELETING,
etc.
Bash script to automate Kinesis setup and testing
Here is twtech Bash
script that automates the full
lifecycle of an Amazon Kinesis Data Stream, including:
- Stream creation
- Sending test data
- Reading from the stream
- Clean-up
Bash Script: kinesis-setup-test.sh
# bash
#!/bin/bash
# CONFIGURATION
STREAM_NAME="twtech-kinesis-stream"
SHARD_COUNT=1
PARTITION_KEY="twtech-Key"
REGION="us-east-1"
TEST_DATA="Hello twtech team from Kinesis Bash
Script!"
TMP_ITERATOR_FILE="shard_iterator.txt"
TMP_RECORD_FILE="record_output.json"
# CREATE STREAM
echo "Creating Kinesis stream:
$twtech-kinesis-stream"
aws kinesis create-stream \
--stream-name "$twtech-kinesis-stream" \
--shard-count "$SHARD_COUNT" \
--region "$us-east-2"
echo "Waiting for stream to become
ACTIVE..."
aws kinesis wait stream-exists \
--stream-name "$twtech-kinesis-stream" \
--region "$REGION"
echo "Stream $twtech-kinesis-stream is ACTIVE."
# PUT A RECORD
echo "Putting record into the
stream..."
aws kinesis put-record \
--stream-name "$twtech-kinesis-stream" \
--partition-key "$PARTITION_KEY" \
--data
"$TEST_DATA" \
--region "$us-east-2"
# GET SHARD ID
SHARD_ID=$(aws kinesis describe-stream \
--stream-name "$twtech-kinesis-stream" \
--region "$us-east-2" \
--query "StreamDescription.Shards[0].ShardId" \
--output text)
echo "Shard ID: $SHARD_ID"
# GET SHARD ITERATOR
SHARD_ITERATOR=$(aws kinesis
get-shard-iterator \
--stream-name "$twtech-kinesis-stream" \
--shard-id "$SHARD_ID" \
--shard-iterator-type TRIM_HORIZON \
--region
"$us-east-2" \
--query 'ShardIterator' \
--output text)
echo "Got shard iterator."
# GET RECORDS
echo "Waiting a few seconds for data to
arrive..."
sleep 5
echo "Reading records from the
stream..."
aws kinesis get-records \
--shard-iterator
"$SHARD_ITERATOR" \
--limit 10 \
--region "$us-east-2" \
> "$TMP_RECORD_FILE"
cat "$TMP_RECORD_FILE"
# CLEAN-UP
echo "Cleaning up: deleting stream
$twtech-kinesis-stream"
aws kinesis delete-stream \
--stream-name "$twtech-kinesis-stream" \
--region "$us-east-2" \
--enforce-consumer-deletion
echo "Done."
# CLEAN TEMP FILES
rm -f "$TMP_ITERATOR_FILE" "$TMP_RECORD_FILE"
How to Run the script:
- Save the script as twtech-kinesis-setup-test.sh
- Make it executable(grant executable permisions for file):twtech-kinesis-setup-test.sh
# bash
chmod +x kinesis-setup-test.sh
- Run it:
# bash
./twtech-kinesis-setup-test.sh
# or simply
sh twtech-kinesis-setup-test.sh
Requirements
- AWS CLI installed and configured (aws
configure)
- IAM credentials with permissions:
- kinesis:*
What This Script Does
- Creates a stream
- Sends a sample record
- Reads back the data
- Deletes the stream after test
- Cleans up temp files
Step1: Verify the version of aws cli in cloudshell:
aws --version
If Version of aws cli presents is: aws-cli/2.27.53
Then, Step2: twtech needs to send a records to created: twtech-kinesis-data-streams.
twtech-user signup command with the following command
for: aws
cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream
--partition-key twtech-user1 --data "twtech-user signup" --cli-binary-format
raw-in-base64-out
twtech-user login command with the following command for: aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user login" --cli-binary-format
raw-in-base64-out
twtech-user signout command with the following command
for: aws
cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signout" --cli-binary-format
raw-in-base64-out
How twtech kinesis data stream metrics is monitored form(console):
GUI:
NB:
It takes some couple of minutes
for the cloudwatch record to be updated:
How twtech consumes resources from its kinesis data streams:
Describe the stream to know what to consume from it: twtech-kinesis-data-stream
aws kinesis describe-stream --stream-name twtech-kinesis-data-stream
How twtech
consumes data from its kinesis data streams: twtech-kinesis-data-streams
# To get a ShardIterator string
aws kinesis
get-shard-iterator --stream-name
twtech-kinesis-data-stream --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON
How twtech gets records of its sharditeration:
aws kinesis get-records --shard-iterator <iterator-string generation>
aws kinesis get-records --shard-iterator
"AAAAAAAAAAHDJ9j/PfGsozwtU6MhxneBZsHt+DwVu8gltmGCWomkDSWdVxT+jKTURYwcEpiMLdP4XXc5t1MxrsTB6DciGvoja6ifanQowuRG37ULlgYILeYODCsJ3sxxxxxtMKxxt8XxFPePb06ANJpIuRbTgetS81YgM0GBgaWlxlwP26hIyqGPcOF81xTvSsT6u+Xvv+Xa4+lWNbHXc3/M1vjtHUCUhKZG4RsLh2MIH4HMHdQVHl7aWrAjHfK25y5TA5ImOzbg="
How twtech reads the coded data: Google-Search for base64 decode online
Link: https://www.base64decode.org/
Copy and paste the coded data to decode:
From:
To:
No comments:
Post a Comment