Amazon Data Firehose is now
officially known as Amazon Kinesis Data Firehose,
and it's a fully managed service provided by AWS that allows twtech to reliably load streaming data into data lakes, data
stores, and analytics services.
The concept: Amazon
Kinesis Data Firehose
It's part of the Amazon Kinesis
family and is specifically designed for real-time data delivery. It
automatically scales to match your data throughput and requires no ongoing
administration.
Key Features
Feature |
Description |
Fully Managed |
No need to manage infrastructure
or write custom data ingestion code. |
Real-Time |
Ingests streaming data with
minimal latency. |
Multiple Destinations |
Supports Amazon S3, Amazon
Redshift, Amazon OpenSearch Service, Splunk, Datadog, and HTTP endpoints. |
Data Transformation |
Integrates with AWS Lambda to
transform data on the fly before delivery. |
Data Format Conversion |
Can convert JSON to Parquet/ORC
and compress data (e.g., GZIP). |
Automatic Scaling |
Automatically adjusts to match
incoming data volume. |
Retries and Error Handling |
Automatically retries failed data
delivery. |
Typical Data Flow
# text
twtechApps,twtech-clients, other Data Producers
↓
Amazon
Kinesis Data Firehose
↓
(Optional:
AWS Lambda for transformation)
↓
Third party Destinations(splunk, Datadog, New Relic, MongoDB)
↓
Destination (e.g., S3, Redshift, OpenSearch, & many more.)
Common Use Cases
- Real-time log analytics (web server logs, IoT data,
etc.)
- Streaming ETL pipelines
- Security and compliance monitoring
- Application telemetry ingestion
- Machine learning model input pipelines
Example Use Case
twtech wants to stream web server logs
to Amazon S3 for later analysis in Athena:
- Configure a Kinesis Firehose delivery stream.
- Send logs (e.g., from Apache or Nginx) to the Firehose
stream.
- Firehose buffers and optionally transforms data (e.g.,
adds timestamps, formats JSON).
- It delivers the data to an S3 bucket every N seconds or
MBs.
- Query the data using Amazon Athena.
Security Posture:
- Supports IAM roles and policies.
- Can use KMS for encryption at rest.
- Supports HTTPS for data in transit.
Project: Hands-on
How twtech uses Amazone kinesis Firehose with Delivery
streams to ingest and process its streaming data.
Search for service: kinesis
Create data stream : Amazon Data Firehose
Select: Amazon
Data Firshose
Create a delivery stream for: Kinesis Data Firehose Stream
Choose
source and destination
Specify
the source and the destination for your Firehose stream. You cannot change the
source and destination of your Firehose stream once it has been created.
Firehose stream name (is auto-generated) : KDS-S3-zF2LM
Source
settings
:
twtech-kinesis-data-stream
Destination
settings
Specify
the destination settings for your Firehose stream
Buffer hints, compression,
file extension and encryption
The
fields below are pre-populated with the recommended default values for S3.
Pricing may vary depending on storage and request costs.
Advanced
settings
Server-side
encryption not enabled; error logging enabled; IAM role
KinesisFirehoseServiceRole-KDS-S3-us-east-2-1753316140xxx; no
tags.
Create kinesis Firehose stream: KDS-S3-zF2LM
Metrics populated can be monitored: Monitoring and observability.
Configuration can be edited to match twtech desired use cases: configuraton
How twtech tests kinesis data Firehose: by putting data
Test with demo data
Ingest
simulated data to test the configuration of your Firehose stream. Standard
Amazon Data Firehose charges apply.
This send test data to s3: twtechs3
twtech may also decide to put data to firehose stream
using cloudshell termin or any terminal configured: CloudShell
tep1:
Verify the version of aws cli in cloudshell:
aws --version
If
Version of aws cli presents is: aws-cli/2.27.53
Then,
Step2: twtech needs to send a records to created: twtech-kinesis-data-streams.
twtech-user
signup command with the following command for: aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signup" --cli-binary-format raw-in-base64-out
twtech-user
login command with the following command for:aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user login" --cli-binary-format raw-in-base64-out
twtech-user
signout command with the following command for: aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user signout" --cli-binary-format raw-in-base64-out
twtech need to verify if these three records generated
are found in the assigned s3 bucket: twtechs3
By Clicking on the folder(would show the
partitions created): 2025
twtech can click on each record cerated an open with a
text editor to read the content.
Go into the downloads folder to get the data files
created: Downloads
Open the file downloaded with a text editor to read
content: twtech-user login
And : twtech-user signout
No comments:
Post a Comment