Here is twtech
Deep Dive into API Gateway integration with Kinesis Data Streams, Kinesis Data Firehose, and
Amazon S3.
Scope:
- Data flow,
- Components,
- Use cases, and
- Best practices.
Breakdown:
- Overview,
- Components & Flow,
- Amazon API Gateway,
- Kinesis Data Streams (Real-Time, Low-Latency Stream Processing),
- Kinesis Data Firehose (Fully Managed Ingestion → Storage/Analytics),
- Amazon S3 (Long-Term Durable Storage),
- End-to-End Architecture,
- API Gateway → Kinesis Integration Details,
- Choosing Between Data Streams vs Firehose,
- Using both Data Streams & Firehose,
- Security Considerations,
- Best Practices.
Overview
- API Gateway integration with Kinesis Data Streams, Kinesis Data Firehose, and Amazon S3 is used when twtech wants to capture high-volume, real-time API data (events, logs, telemetry, metrics, IoT payloads, analytics data) then store the data or stream it with low latency, high durability, and real-time processing capabilities.
Components & Flow
1. Amazon API Gateway
API Gateway provides:
- A fully managed public REST/HTTP endpoint
- Authentication (IAM, Cognito, Lambda authorizer, API keys)
- Request validation & throttling
- Payload transformation (VTL/Passthrough)
- Direct service integration with:
- Kinesis Data Streams
- Kinesis Data Firehose
Two integration
modes:
- Direct AWS Service Integration (no Lambda)
- Lambda Proxy Integration → then Kinesis
2. Kinesis Data Streams (Real-Time,
Low-Latency Stream Processing)
NB:
API
Gateway can push requests directly into a
Kinesis Data Stream using AWS service integration.
Flow:
- API Gateway receives request
- API Gateway signs the request using SigV4
- Record is inserted into a specific Kinesis Stream shard
- Consumers process records in real time:
- AWS Lambda consumer
- Kinesis Client Library apps (EC2/ECS)
- Kinesis Data Analytics
- Kinesis → Firehose → S3
Use cases:
- High-volume
clickstream ingestion
- IoT
telemetry
- Financial
tick data
- Live
metrics/analytics pipelines
- Machine-learning
data ingestion
3. Kinesis Data Firehose (Fully
Managed Ingestion → Storage/Analytics)
NB:
Firehose is a simpler option when twtech doesn’t want custom
stream processing.
API Gateway → Kinesis Firehose →S3/Data Lake Store/OpenSearch/Custom endpoints via Lambda
- S3
- Redshift
- OpenSearch
- Data Lake storage
- Custom endpoints (via Lambda transformation)
Flow:
- API Gateway receives an event
- Writes JSON payload into Firehose
- Firehose buffers (size/time-based)
- Delivers compressed/collated files to S3 (or other destination)
Optional: Lambda
transformations
Firehose can invoke a Lambda function for:
- PII
redaction
- JSON → Parquet conversion
- Data
normalization
- Data
flattening
4. Amazon S3 (Long-Term Durable
Storage)
Kinesis Firehose typically stores data into S3 in:
- Snappy-compressed
Parquet
- GZIP JSON
- ORC
Ideal for:
- Data
lakes
- Audit
logs
- API
analytics
- Cost-efficient
retention
End-to-End Architecture
Flow 1: API Gateway → Kinesis Data Streams → Consumers → Optional S3
Flow 2: API Gateway → Kinesis Firehose → S3
API Gateway → Kinesis Integration Details
API Gateway
Request Template Sample (PutRecord)
# json{ "StreamName": "twtech-stream", "Data": "$util.base64Encode($input.body)", "PartitionKey": "$context.requestId"}IAM role:
API Gateway must assume a role with:
kinesis:PutRecordkinesis:PutRecordsfirehose:PutRecordfirehose:PutRecordBatch Choosing
Between Data Streams vs Firehose
|
Feature |
Kinesis Data Streams |
Kinesis Firehose |
|
Latency |
~70 ms |
1–5 min buffer |
|
Real-time processing |
Yes |
No |
|
Custom compute |
Lambda, KCL |
Optional Lambda |
|
Storage destinations |
Flexible |
Mostly S3/Redshift/OpenSearch |
|
Throughput scaling |
Manual shard scaling |
Fully managed |
|
Ideal for |
Real-time apps |
Data ingestion and storage |
Using both Data Streams & Firehose
Security Considerations
- API
Gateway IAM auth + SigV4 request signing
- KMS
encryption for Streams & Firehose
- S3 bucket
encryption + Block Public Access
- Private
API Gateway with VPC endpoint optional
- WAF
integration to filter malicious calls
Best
Practices
- Use partition keys that evenly distribute data
- Use Firehose for S3-only pipelines (simpler, cheaper)
- Use Streams for real-time consumer apps
- Enable DLQs (dead-letter queues) in Firehose
- For heavy workloads → enable API Gateway throttling
No comments:
Post a Comment