Tuesday, July 22, 2025

Amazon Kinesis Data Streams | Focused Overview.

 

Here’s twtech focused overview of Amazon Kinesis Data Streams (KDS) helps twtech to understand its architecture, use cases, performance characteristics, and how to use it effectively.

 The concept: Kinesis Data Streams

Amazon Kinesis Data Streams (KDS) is a real-time data streaming service that lets you ingest, buffer, and process large volumes of data continuously in real time.

Use it to capture things like:

  • Clickstream data
  • Log files
  • IoT telemetry
  • Social media feeds
  • Application events

 Core Components to keep in mind

Component

Description

Stream

The core object in KDS; holds a series of shards.

Shard

A unit of capacity: 1 MB/sec write and 2 MB/sec read throughput.

Data Record

The individual unit of data (up to 1 MB), includes a partition key, data blob, and sequence number.

Partition Key

Determines how records are distributed across shards.

Retention Period

Default 24 hours (can go up to 365 days).

 How Data Flows in KDS

  1. Producers (apps, IoT devices, AWS services) send records to a stream.
  2. KDS stores records in shards in ordered sequences.
  3. Consumers (like Lambda, KCL apps, or custom clients) read and process the data.

 Producer Options

  • AWS SDK (PutRecord / PutRecords)
  • AWS IoT Core
  • Amazon CloudWatch Logs subscriptions
  • AWS Kinesis Agent (for log files)

 Consumer Options

Type

Description

Kinesis Client Library (KCL)

Java-based library to build robust consumers. Handles load balancing and checkpointing.

AWS Lambda

Easiest option. Automatically scales, processes batches from the stream.

Enhanced Fan-Out

Each consumer gets a dedicated 2 MB/sec pipe per shard — ideal for low-latency parallel consumers.

GetRecords / GetShardIterator

Low-level APIs for custom consumers.

 Configuration & Scaling

  • Manual Scaling: Add/remove shards.
  • Auto Scaling: Enable Kinesis scaling with CloudWatch and Application Auto Scaling.
  • Provisioned Throughput: Based on shard count (1 MB/sec in, 2 MB/sec out per shard).
  • Limits: Each record = max 1 MB; max 1,000 PUT records per second per shard.

 Security

  • IAM policies for access control
  • Encryption at rest with AWS KMS
  • TLS encryption in transit
  • VPC endpoints for private access

 Monitoring

twtech uses CloudWatch to track changes: Monitoring & Observability.

  • IncomingBytes, IncomingRecords
  • GetRecords.IteratorAgeMilliseconds
  • ReadProvisionedThroughputExceeded
  • WriteProvisionedThroughputExceeded

 Common Use Cases

Use Case

Description

Real-time analytics

Collect clickstream or game telemetry for live dashboards.

Log processing

Ingest app/server logs to process or archive.

IoT data

Collect sensor data for real-time processing.

Data lake ingestion

Stream into S3 (via Firehose or custom consumers).

Machine learning

Feed real-time features into ML models.

 Compared to Other Services

Feature

KDS

Kinesis Firehose

Amazon MSK

Purpose

Real-time stream processing

Data delivery

Kafka-compatible event streaming

Control

High

Minimal

High (Kafka config)

Latency

Low (ms)

~1 min

Low

Use Case

Real-time apps

Simple delivery pipelines

Kafka-based systems

 Getting Started

# bash

# Example: Using AWS CLI to put a record

aws kinesis put-record \

  --stream-name twtech-data-stream \

  --partition-key user1 \

  --data "Hello twtech team from Kinesis!"


Or configure AWS Lambda as a consumer directly from the console and link it to kinesis data stream.

Here's twtech step-by-step guide to configure AWS Lambda as a consumer of a Kinesis Data Stream using the AWS Console — no code needed initially. 

This is one of the easiest and most powerful ways to process streaming data in real time.

Prerequisites

  • A Kinesis Data Stream (e.g., twtech-kds-stream)
  • A Lambda function (or twtech creates one)
  • Appropriate IAM permissions (Lambda needs to read from Kinesis and write logs to CloudWatch)

 Step-by-Step: Link Lambda to Kinesis from Console

Step 1: Go to Lambda Console

  1. Open the AWS Lambda Console
  2. Click “Create function”
    • Choose Author from scratch
    • Give it a name (e.g., KinesisStreamConsumer)
    • Choose a runtime (e.g., Python 3.x or Node.js)
    • Create a new execution role or use an existing one (see IAM tip below)
    • Click “Create function”

Step 2: Add Kinesis as a Trigger

  1. After the function is created, scroll to the Function overview
  2. Click “+ Add trigger”
  3. In the “Select a trigger” dropdown, choose Kinesis
  4. Configure:
    • Kinesis stream: Choose twtech stream (e.g., twtech-kds-stream)
    • Batch size: 100 (default; smaller if twtech wants lower latency)
    • Starting position:
      • Latest: only new records going forward (recommended for most use cases)
      • Trim Horizon: from the oldest available record
    • Enable trigger: Yes (check the box)
  5. Click “Add”

Step 3: Write the Lambda Handler Code (Example)

Here's twtech sample Python function to decode and log records:

# python 

import base64

def lambda_handler(event, context):

    for record in event['Records']:

        payload = base64.b64decode(record['kinesis']['data'])

        print("Decoded record:", payload.decode('utf-8'))

    return {'statusCode': 200}

Click Deploy to save the function.

Step 4: Test It Out

Send a test record to twtech Kinesis stream:

# bash

aws kinesis put-record \

  --stream-name twtech-kds-stream \

  --partition-key test-key \

  --data "Hello twtech team from Lambda consumer!"

Then check CloudWatch Logs:

  • Go to CloudWatch Logs Console
  • Look under /aws/lambda/KinesisStreamConsumer (or your function name)
  • twtech should see the decoded payload

 IAM Role Permissions for Lambda

The Lambda execution role should have this policy:

# json

{

  "Effect": "Allow",

  "Action": [

    "kinesis:GetRecords",

    "kinesis:GetShardIterator",

    "kinesis:DescribeStream",

    "kinesis:ListStreams",

    "logs:CreateLogGroup",

    "logs:CreateLogStream",

    "logs:PutLogEvents"

  ],

  "Resource": "*"

}

 Tips & Gotchas

✅ Tip

⚠️ Gotcha(watchout)

Use batch window to delay and group records (up to 5 mins).

Lambda timeout should be > batch window or record batch size.

Use environment variables for stream names or custom params.

Avoid long processing; Lambda has a 15-min max duration.

Use destination errors (DLQs) to catch failed records.

No retries after 7 days (data expires in Kinesis by then).

Walk through a hands-on lab to connect Kinesis Data Streams (KDS) with Kinesis Data Firehose, AWS Lambda, and Amazon S3 — a classic real-time streaming pipeline.

Goal

 Set up KDS → Lambda (process data) → Firehose → S3
 twtech will simulate real-time event publishing and storage

 Architecture Overview

# scss

[Producer App / CLI]

        │

        ▼

[Kinesis Data Stream]

        │

        ├──── [Lambda Function] — (transforms / logs data)

        │

        ▼

[Kinesis Data Firehose]

        │

        ▼

        [twtech-S3Bucket]

 Step-by-Step Hands-On Lab

1. Create an S3 Bucket (for Firehose destination)

# bash

aws s3 mb s3://twtech-kds-firehose-bucket 

2. Create a Kinesis Data Stream

# bash

aws kinesis create-stream \

  --stream-name twtech-kds-stream \

  --shard-count 1

Wait until the stream becomes ACTIVE

 3. Create a Lambda Function to Consume KDS

Sample Lambda function (Python)

# python

import base64

import json

def lambda_handler(event, context):

    for record in event['Records']:

        payload = base64.b64decode(record["kinesis"]["data"])

        print("Decoded payload:", payload.decode('utf-8'))

    return {"statusCode": 200}

Deploy Lambda:

  1. Go to AWS Console → Lambda → Create Function
  2. Choose Python 3.x runtime
  3. Paste the code above
  4. Save and deploy

Add Kinesis Trigger:

  • Configuration → Triggers → Add trigger → Kinesis → Select twtech-kds-stream

Set batch size (e.g., 100), enable trigger

 4. Create Firehose Delivery Stream (from KDS to S3)

# bash 

aws firehose create-delivery-stream \

  --delivery-stream-name twtech-kds-firehose \

  --delivery-stream-type KinesisStreamAsSource \

  --kinesis-stream-source-configuration RoleARN=<twtech-IAMRoleARN> StreamARN=<twtech-StreamARN> \

  --extended-s3-destination-configuration file://s3-config.json

twtech would need:

  • StreamARN → get via aws kinesis describe-stream
  • IAMRoleARN → role with permission to access KDS + S3

s3-config.json example:

# json

{

  "RoleARN": "arn:aws:iam::123456789xxx:role/twtech-firehose-role",

  "BucketARN": "arn:aws:s3:::twtech-kds-firehose-bucket",

  "Prefix": "firehose-data/",

  "BufferingHints": {

    "SizeInMBs": 1,

    "IntervalInSeconds": 60

  },

  "CompressionFormat": "UNCOMPRESSED"

}

 5. Send Test Data to KDS

# bash

aws kinesis put-record \

  --stream-name twtech-kds-stream \

  --partition-key user1 \

  --data "Hello twtech team from Kinesis!"

twtech should see:

  • Lambda logs in CloudWatch
  • Processed records land in S3 (after 60s buffer)

 Verify

 Lambda

  • Go to CloudWatch Logs
  • Look for decoded payloads

 S3

  • Go to my-kds-firehose-bucket
  • Check under /firehose-data/ for records (txt/json depending on setup)

 IAM Permissions (Important!)

 Lambda execution role:

# json 

{

  "Effect": "Allow",

  "Action": ["logs:*", "kinesis:*"],

  "Resource": "*"

}

 Firehose role:

# json

{

  "Effect": "Allow",

  "Action": ["kinesis:*", "s3:*", "logs:*"],

  "Resource": "*"

}

 twtech-Bonus Ideas:

  • Use Lambda to transform the payload before sending it to Firehose
  • Add Kinesis Data Analytics for real-time SQL queries

Use Enhanced Fan-Out for parallel Lambda consumers.

No comments:

Post a Comment

Kubernetes Clusters | Upstream Vs Downstream.

  The terms "upstream" and "downstream" in the context of Kubernetes clusters often refer to the direction of code fl...