Tuesday, July 22, 2025

Amazon Kinesis Data Firehose(Also known as) | Amazon Data Firehose.

 

Amazon Data Firehose is now officially known as Amazon Kinesis Data Firehose, and it's a fully managed service provided by AWS that allows twtech to reliably load streaming data into data lakes, data stores, and analytics services.

 The concept: Amazon Kinesis Data Firehose

It's part of the Amazon Kinesis family and is specifically designed for real-time data delivery. It automatically scales to match your data throughput and requires no ongoing administration.

 Key Features

Feature

Description

Fully Managed

No need to manage infrastructure or write custom data ingestion code.

Real-Time

Ingests streaming data with minimal latency.

Multiple Destinations

Supports Amazon S3, Amazon Redshift, Amazon OpenSearch Service, Splunk, Datadog, and HTTP endpoints.

Data Transformation

Integrates with AWS Lambda to transform data on the fly before delivery.

Data Format Conversion

Can convert JSON to Parquet/ORC and compress data (e.g., GZIP).

Automatic Scaling

Automatically adjusts to match incoming data volume.

Retries and Error Handling

Automatically retries failed data delivery.


Typical Data Flow

# text

twtechApps,twtech-clients, other Data Producers

                     ↓

Amazon Kinesis Data Firehose

                     ↓

(Optional: AWS Lambda for transformation)

                    ↓

Third party Destinations(splunk, Datadog, New Relic, MongoDB)

                    

Destination (e.g., S3, Redshift, OpenSearch, & many more.)

 Common Use Cases

  • Real-time log analytics (web server logs, IoT data, etc.)
  • Streaming ETL pipelines
  • Security and compliance monitoring
  • Application telemetry ingestion
  • Machine learning model input pipelines

 Example Use Case

twtech wants to stream web server logs to Amazon S3 for later analysis in Athena:

  1. Configure a Kinesis Firehose delivery stream.
  2. Send logs (e.g., from Apache or Nginx) to the Firehose stream.
  3. Firehose buffers and optionally transforms data (e.g., adds timestamps, formats JSON).
  4. It delivers the data to an S3 bucket every N seconds or MBs.
  5. Query the data using Amazon Athena.

 Security Posture:

  • Supports IAM roles and policies.
  • Can use KMS for encryption at rest.
  • Supports HTTPS for data in transit.

Project: Hands-on

How twtech uses Amazone kinesis Firehose with Delivery streams to ingest and process its streaming data.

Search for service: kinesis

Create data stream : Amazon Data Firehose

Select: Amazon Data Firshose

Create a delivery stream for: Kinesis Data Firehose Stream

Amazon kinesis Data Firehose: How it works:

Choose source and destination

Specify the source and the destination for your Firehose stream. You cannot change the source and destination of your Firehose stream once it has been created.


Firehose stream name (is auto-generated) : KDS-S3-zF2LM 

Source settings

: twtech-kinesis-data-stream



Destination settings

Specify the destination settings for your Firehose stream


Buffer hints, compression, file extension and encryption

The fields below are pre-populated with the recommended default values for S3. Pricing may vary depending on storage and request costs.


Advanced settings

Server-side encryption not enabled; error logging enabled; IAM role KinesisFirehoseServiceRole-KDS-S3-us-east-2-1753316140xxx; no tags.


Create kinesis Firehose stream: KDS-S3-zF2LM

Metrics populated can be monitored: Monitoring and observability.


Configuration can be edited to match twtech desired use cases: configuraton





How twtech tests kinesis data Firehose: by putting data

Test with demo data

Ingest simulated data to test the configuration of your Firehose stream. Standard Amazon Data Firehose charges apply.

This send test data to s3:  twtechs3


twtech may also decide to put data to firehose stream using cloudshell termin or any terminal configured: CloudShell

tep1: Verify the version of aws cli in cloudshell:

aws --version 


If Version of aws cli presents is: aws-cli/2.27.53

Then, Step2: twtech needs to send a records to created: twtech-kinesis-data-streams.

twtech-user signup command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signup" --cli-binary-format raw-in-base64-out

twtech-user login command with the following command for:aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user login" --cli-binary-format raw-in-base64-out

twtech-user signout command with the following command for: aws cli version 2

aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signout" --cli-binary-format raw-in-base64-out

twtech need to verify if these three records generated are found in the assigned s3 bucket: twtechs3 


By Clicking on the folder(would show the partitions created): 2025 


twtech can click on each record cerated an open with a text editor to read the content.

Go into the downloads folder to get the data files created: Downloads

Open the file downloaded with a text editor to read content: twtech-user login

And : twtech-user signout


No comments:

Post a Comment

Kubernetes Clusters | Upstream Vs Downstream.

  The terms "upstream" and "downstream" in the context of Kubernetes clusters often refer to the direction of code fl...