Amazon Kinesis Data Firehose (alias: Amazon Data Firehose,) - Overview & Hands-On.
Scope:
- Intro,
- The concept: Amazon Kinesis Data Firehose,
- Key Features,
- Architecture,
- Typical Data Flow,
- Common Use Cases,
- Sample Use Case,
- Security Posture
- Project: Hands-on
Intro:
- Amazon Data Firehose is now officially known as Amazon Kinesis Data Firehose.
- Amazon Kinesis Data Firehose is a fully managed service provided by AWS that allows twtech to reliably load streaming data into data lakes, data
stores, and analytics services.
The concept: Amazon
Kinesis Data Firehose
- Amazon Kinesis Data Firehose is part of the Amazon Kinesis family.
- Amazon Kinesis Data Firehose is specifically designed for real-time data delivery.
- Amazon Kinesis Data Firehose automatically scales to match twtech data throughput and requires no ongoing administration.
Key Features
|
Feature |
Description |
|
Fully Managed |
No need to manage infrastructure
or write custom data ingestion code. |
|
Real-Time |
Ingests streaming data with
minimal latency. |
|
Multiple Destinations |
Supports Amazon S3, Amazon
Redshift, Amazon OpenSearch Service, Splunk, Datadog, and HTTP endpoints. |
|
Data Transformation |
Integrates with AWS Lambda to
transform data on the fly before delivery. |
|
Data Format Conversion |
Can convert JSON to Parquet/ORC
and compress data (e.g., GZIP). |
|
Automatic Scaling |
Automatically adjusts to match
incoming data volume. |
|
Retries and Error Handling |
Automatically retries failed data
delivery. |
Typical Data Flow
Common Use Cases
- Real-time log analytics (web server logs, IoT data,
etc.)
- Streaming ETL pipelines
- Security and compliance monitoring
- Application telemetry ingestion
- Machine learning model input pipelines
Sample Use Case
twtech wants to stream web server logs
to Amazon S3 for later analysis in Athena:
- Configure a Kinesis Firehose delivery stream.
- Send logs (e.g., from Apache or Nginx) to the Firehose
stream.
- Firehose buffers and optionally transforms data (e.g.,
adds timestamps, formats JSON).
- It delivers the data to an S3 bucket every N seconds or
MBs.
- Query the data using Amazon Athena.
Security Posture:
- Supports IAM roles and policies.
- Can use KMS for encryption at rest.
- Supports HTTPS for data in transit.
Project: Hands-on
- How twtech deploys and use Amazon kinesis data Firehose streams to ingest streaming data, then reliably load it into data lakes, data stores, and analytics services.
- Search for AWS service: kinesis
Step-2:
Create data stream : Amazon Data Firehose
- Select: Amazon
Data Firshose
- Create a delivery stream for: Kinesis Data Firehose Stream
Step-3:
Choose
source and destination
- Specify the source and the destination for twtech Firehose stream.
- twtech cannot change the source and destination of its Firehose stream once it has been created.
- Firehose stream name (is auto-generated) : KDS-S3-zF2LM
Source
settings
-
:
twtech-kinesis-data-stream
Destination
settings
- Specify the destination settings for twtech Firehose stream
Buffer hints, compression,
file extension and encryption
- The fields below are pre-populated with the recommended default values for S3.
- Pricing may vary depending on storage and request costs.
Advanced
settings
- Server-side
encryption not enabled; error logging enabled; IAM role
KinesisFirehoseServiceRole-KDS-S3-us-east-2-1753316140xxx; no
tags.
- Create kinesis Firehose stream: KDS-S3-zF2LM
Step-4:
- twtech accesses Metrics populated for: Monitoring and observability.
Step-5:
- New Configuration twtech can make to match its desired use cases: Configuraton
Step-6:
- twtech tests kinesis data Firehose: by putting data
- Test with demo data (as shown below)
- The simulated data is ingest to test the configuration of twtech Firehose stream. Standard Amazon Data Firehose charges apply.
- For this project, the ingested (streaming) data is delivered to datastore (s3): twtechs3
- How twtech streams data into Kinesis data firehose using cloudshell or any terminal configured: CloudShell
Step-7:
- First, twtech Verifies the version of aws cli in cloudshell:
aws --version
- If Version of aws cli presents is: aws-cli/2.27.53 then, proceed to Next step.
Step-8:
- twtech sends a records to the created kinesis data streams: twtech-kinesis-data-streams.
- twtech-user signup with the following command for: aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user1 signup" --cli-binary-format raw-in-base64-out
Step-9:
twtech-user1 login with the following command for:aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user1 login" --cli-binary-format raw-in-base64-out
Step-10:
twtech-user1 signout with the following command for: aws cli version 2
aws kinesis put-record --stream-name twtech-kinesis-data-stream --partition-key twtech-user1 --data "twtech-user1 signout" --cli-binary-format raw-in-base64-out
Step-11:
- twtech verifies if these three records generated (simulated) are:
- streamed with kinesis data streams,
- Ingested by kinesis data firehose and delivered to the assigned datastore (s3 bucket) : twtechs3
Step-12:
- By Clicking on the folder (would show the
partitions created): 2025
- twtech can click on each record cerated to access with a text editor and read the content.
Step-13:
- Go into the downloads folder to get the data files
created: Downloads
Step-14:
- twtech Opens the file downloaded with a text editor to read
content: twtech-user login message
And
- twtech-user1: signout message
No comments:
Post a Comment