Amazon Kinesis Data Streams - Overview & Hands-On.
Scope:
- Intro,
- Key Concepts,
- How Kinesis Data Streams Works,
- Architecture,
- Producer Options,
- Consumer Options,
- Security Features.
- Monitoring & Scaling,
- Common Use Cases
- Comparison of Amazon Kinesis Data Streams with firehose,
- Management & Complexity,
- Performance & Scaling,
- Data Delivery & Destinations,
- Pricing,
- Recommendation of When to Use Kinesis Data Streams or firehose,
- AWS CLI commands for Amazon Kinesis Data Streams, (including operations),
- Sample Bash script to automates the full lifecycle of an Amazon Kinesis Data Stream,
- Project: Hands-on.
Intro:
- Amazon Kinesis Data Streams (KDS) is a fully managed, real-time data streaming service by AWS.
- Amazon Kinesis Data Streams (KDS) allows twtech to collect, process, and analyze streaming data at
scale.
Ideal use cases:
- Log processing,
- Event data collection,
- Real-time analytics,
- Machine learning,
- Anomaly detection.
Key Concepts
|
Term |
Description |
|
Stream |
A logical grouping of data records.
Each stream can have multiple shards. |
|
Shard |
A unit of throughput in KDS. Each
shard supports up to 1 MB/sec write and 2 MB/sec read. |
|
Data Record |
The basic unit of data in KDS.
Contains a sequence number, partition key, and blob of data. |
|
Partition Key |
A string used to group data within
shards. Determines how records are distributed. |
|
Sequence Number |
A unique identifier for each record
within a shard. |
|
Retention Period |
The time that data is stored in a
stream (default: 24 hours; up to 365 days with extended retention). |
How Kinesis Data Streams Works
- Producers (apps, AWS services, IoT devices, etc.)
send data to KDS.
- Data is distributed across shards in the stream using partition keys.
- Consumers (e.g., Lambda, Kinesis Client Library [KCL], twtech-custom-apps) read and process the data.
Producer Options
- AWS SDKs (PutRecord / PutRecords API)
- AWS IoT
- Amazon Kinesis Agent
- Amazon CloudWatch Logs / Firehose / IoT
Core
Consumer Options
- Kinesis Data Analytics – SQL for real-time analytics.
- AWS Lambda – Serverless processing (via event
source mapping).
- Custom Consumers – Using KCL or AWS SDK.
- Enhanced Fan-Out (EFO) – 2 MB/sec per consumer, with dedicated
throughput.
Security Features
- Encryption at Rest – KMS integration.
- TLS in Transit
- IAM-based Access Control
- VPC Endpoints for private connectivity.
Monitoring & Scaling
- CloudWatch Metrics: IncomingBytes,
GetRecords.IteratorAgeMilliseconds, etc.
- Auto-scaling: With AWS Application Auto Scaling or
manual shard management.
Common Use Cases
- Real-time application log streaming
- Clickstream analytics (web/mobile apps)
- IoT telemetry data
- Fraud detection
- Live metrics dashboards
twtech-Insights:
Comparison of Amazon Kinesis Data Streams with firehose and MSK (kafka)
Purpose & Use Case
|
Feature |
Kinesis
Data Streams (KDS) |
Kinesis
Data Firehose |
|
|
Primary Use |
Real-time custom app streaming and
processing |
Data delivery (ETL, buffering,
delivery to S3, Redshift, etc.) |
|
|
Ideal For |
Real-time event processing, custom
app logic |
Simplified data ingestion to
storage/analytics services |
|
|
Latency |
Milliseconds |
Near real-time (usually within 60
seconds) |
|
Management &
Complexity
|
Feature |
KDS |
Firehose |
|
|
Fully Managed |
Yes |
Yes |
|
|
Setup Complexity |
Moderate (requires shard planning
and consumer management) |
Low (easy config, automatic scaling) |
|
|
Custom Processing |
Yes (via Lambda, KCL, etc.) |
Limited (can transform with Lambda) |
|
Performance &
Scaling
|
Feature |
KDS |
Firehose |
|
|
Throughput Scaling |
Manual (shards) or auto-scaling |
Auto-scaling |
|
|
Max Ingest Rate |
~1 MB/sec/shard |
~5 GB/hour/stream (scales up) |
|
|
Data Retention |
24 hrs to 365 days |
24 hrs max |
|
Security & Access
|
Feature |
KDS |
Firehose |
|
|
Encryption |
At rest (KMS), in transit (TLS) |
At rest (KMS), in transit (TLS) |
|
|
VPC Support |
Yes |
Yes |
|
|
IAM Integration |
Yes |
Yes |
|
Data Delivery &
Destinations
|
Feature |
KDS |
Firehose |
|
|
Built-in Destinations |
None (use consumers) |
S3, Redshift, OpenSearch, Splunk,
custom HTTP |
|
|
Transformations |
With Lambda (custom) |
Inline with Lambda |
Pricing
|
Feature |
KDS |
Firehose |
|
|
Cost Model |
Pay per shard, PUT payloads, and
retrievals |
Pay per data ingested and delivered |
|
|
Cheapest Option |
For lightweight custom apps |
For simple ETL and data lake
ingestion |
|
twtech Recommendation
of When to Use What:
- uses KDS if it:
- Needs low-latency, real-time custom
processing
- Wants tight control over
producers/consumers
- Doesn’t need Kafka ecosystem support
- uses Firehose if it :
- Wants to easily ingest data into S3,
Redshift, etc.
- Prefers fully-managed,
zero-maintenance pipelines
- Can tolerate ~1-minute delay and limited transformation
Project: Hands-on
- How twtech creates and use amazon kinesis for its production and consumption.
- Search for the AWS service: Kinesis
- Create data stream: twtech-kinesis-data-streams
- Application:
- Monitoring Kinesis data streams:
- Under configuration,
- Set the number of shards that can be scaled
from 1(default) to double the number like 2 shards
From:
Double the shards: to 2.
- Creating,
- Listing,
- Reading,
- Writing,
- Deleting streams.
1. Create a Stream
# bash
aws kinesis create-stream \
--stream-name twtech-kinesis-stream \
--shard-count 1
NB:
- Use --stream-mode-details for on-demand mode (see below).
2. Create an On-Demand Stream
# bash
aws kinesis create-stream \
--stream-name twtech-kinesis-stream \
--stream-mode-details StreamMode=ON_DEMAND
3. List Streams
# bash
aws kinesis list-streams
4. Describe a
Stream
# bash
aws kinesis describe-stream \
--stream-name
5. Put a Record (Write to Stream)
# bash
aws kinesis put-record \
--stream-name twtech-kinesis-stream \
--partition-key twtechsser1 \
--data
"Hello from twtech Kinesis Data Stream Team via CLI"
- --data must be base64-safe or
plaintext.
- Use --cli-binary-format raw-in-base64-out
if using JSON blobs.
6. Put Multiple Records
# bash
aws kinesis put-records \
--stream-name twtech-kinesis-stream \
--records file://records.json
Sample records.json:
# json
[
{
"Data": "SGVsbG8gd29ybGQ=",
"PartitionKey": "twtech-user1"
},
{
"Data": "VGVzdCBtZXNzYWdl",
"PartitionKey": "twtech-user2"
}
]
7. Get Records from a Shard
Step 1: Get the
Shard Iterator
# bash
aws kinesis get-shard-iterator \
--stream-name twtech-kinesis-stream \
--shard-id shardId-000000000000 \
--shard-iterator-type TRIM_HORIZON
Step 2: Use the
iterator to fetch records
# bash
aws kinesis get-records \
--shard-iterator <iterator-value> \
--limit 10
8. Delete a Stream
# bash
aws kinesis delete-stream \
--stream-name
NB:
Use --enforce-consumer-deletion if
consumers are attached.
9. Split or Merge Shards
Split a shard
# bash
aws kinesis split-shard \
--stream-name twtech-kinesis-stream \
--shard-to-split shardId-000000000000 \
--new-starting-hash-key 170141183460469231731687xxxx15884105728
Merge two shards
# bash
aws kinesis merge-shards \
--stream-name twtech-kds-merge-shards \
--shard-id shardId-000000000000 \
--adjacent-shard-id shardId-000000000001
10. Enable Enhanced Monitoring
# bash
aws kinesis enable-enhanced-monitoring \
--stream-name twtech-kinesis-stream \
--shard-level-metrics
IncomingBytes OutgoingBytes
11. Check
Stream Status
# bash
aws kinesis describe-stream \
--stream-name twtech-kinesis-stream \
--query "StreamDescription.StreamStatus"
twtech Sample Bash script to automates the full lifecycle of an Amazon Kinesis Data Stream,
Including:
- Stream creation
- Sending test data
- Reading from the stream
- Clean-up
# Bash Script: kinesis-setup-test.sh
#!/bin/bash
# CONFIGURATION
STREAM_NAME="twtech-kinesis-stream"
SHARD_COUNT=1
PARTITION_KEY="twtech-Key"
REGION="us-east-2"
TEST_DATA="Hello from twtech Kinesis Data Streams via Bash
Script"
TMP_ITERATOR_FILE="shard_iterator.txt"
TMP_RECORD_FILE="record_output.json"
# CREATE STREAM
echo "Creating Kinesis stream:
$twtech-kinesis-stream"
aws kinesis create-stream \
--stream-name "$twtech-kinesis-stream" \
--shard-count "$SHARD_COUNT" \
--region "$us-east-2"
echo "Waiting for twtech kinesis data stream to become
ACTIVE..."
aws kinesis wait stream-exists \
--stream-name "$twtech-kinesis-stream" \
--region "$REGION"
echo "Stream $twtech-kinesis-stream is now ACTIVE."
# PUT A RECORD
echo "Putting record into the
stream..."
aws kinesis put-record \
--stream-name "$twtech-kinesis-stream" \
--partition-key "$PARTITION_KEY" \
--data
"$TEST_DATA" \
--region "$us-east-2"
# GET SHARD ID
SHARD_ID=$(aws kinesis describe-stream \
--stream-name "$twtech-kinesis-stream" \
--region "$us-east-2" \
--query "StreamDescription.Shards[0].ShardId" \
--output text)
echo "Shard ID: $SHARD_ID"
# GET SHARD ITERATOR
SHARD_ITERATOR=$(aws kinesis
get-shard-iterator \
--stream-name "$twtech-kinesis-stream" \
--shard-id "$SHARD_ID" \
--shard-iterator-type TRIM_HORIZON \
--region
"$us-east-2" \
--query 'ShardIterator' \
--output text)
echo "Got shard iterator."
# GET RECORDS
echo "Waiting a few seconds for data to
arrive..."
sleep 5
echo "Reading records from the
stream..."
aws kinesis get-records \
--shard-iterator
"$SHARD_ITERATOR" \
--limit 10 \
--region "$us-east-2" \
> "$TMP_RECORD_FILE"
cat "$TMP_RECORD_FILE"
# CLEAN-UP
echo "Cleaning up: deleting stream
$twtech-kinesis-stream"
aws kinesis delete-stream \
--stream-name "$twtech-kinesis-stream" \
--region "$us-east-2" \
--enforce-consumer-deletion
echo "Done."
# CLEAN TEMP FILES
rm -f "$TMP_ITERATOR_FILE" "$TMP_RECORD_FILE"
How to Run the script:
- Save the script as twtech-kinesis-setup-test.sh
- Make the script executable (grant executable permisions for file):twtech-kinesis-setup-test.sh
# bash
chmod +x kinesis-setup-test.sh
- Run it:
# bash
./twtech-kinesis-setup-test.sh
# or simply
sh twtech-kinesis-setup-test.sh
Requirements
- AWS CLI installed and configured (aws
configure)
- IAM credentials with permissions:
- kinesis:*
What The above Script Does
- Creates a stream
- Sends a sample record
- Reads back the data
- Deletes the stream after test
- Cleans up temp files
Step1: Verify the version of aws cli in cloudshell:
aws --version
- If Version of aws cli presents is: aws-cli/2.27.53
- Then, Step2: twtech needs to send a records to created: twtech-kinesis-data-streams.
- twtech-user signup command with the following command for: aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream
--partition-key twtech-user1 --data "twtech-user signup" --cli-binary-format
raw-in-base64-out
- twtech-user login command with the following command for: aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user login" --cli-binary-format
raw-in-base64-out
- twtech-user signout command with the following command for: aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signout" --cli-binary-format
raw-in-base64-out
- How twtech kinesis data stream metrics is monitored form (console): UI:
- It takes some couple of minutes for the cloudwatch record to be updated:
- How twtech consumes resources from its kinesis data streams:
- How twtech Describes the stream to know what to consume from it: twtech-kinesis-data-stream
aws kinesis describe-stream --stream-name twtech-kinesis-data-stream
- How twtech
consumes data from its kinesis data streams: twtech-kinesis-data-streams
- How twtech gets a ShardIterator string
aws kinesis
get-shard-iterator --stream-name
twtech-kinesis-data-stream --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON
- How twtech gets records of its sharditeration:
- aws kinesis get-records --shard-iterator <iterator-string generation>
aws kinesis get-records --shard-iterator
"AAAAAAAAAAHDJ9j/PfGsozwtU6MhxneBZsHt+DwVu8gltmGCWomkDSWdVxT+jKTURYwcEpiMLdP4XXc5t1MxrsTB6DciGvoja6ifanQowuRG37ULlgYILeYODCsJ3sxxxxxtMKxxt8XxFPePb06ANJpIuRbTgetS81YgM0GBgaWlxlwP26hIyqGPcOF81xTvSsT6u+Xvv+Xa4+lWNbHXc3/M1vjtHUCUhKZG4RsLh2MIH4HMHdQVHl7aWrAjHfK25y5TA5ImOzbg="
- How twtech reads the coded data in:Base64
- Google-Search for base64 decode online
Link:
- Copy and paste the coded data to decode
From:
To:
No comments:
Post a Comment