Scope:
- Intro,
- Prerequisites,
- Architecture,
- Method 1: One-time Export Task (Console),
- Method 2: Continuous Export (Kinesis Firehose),
- Considerations,
- Overview,
- Way to Export Logs to S3 (Manual Export Task),
- Way to Export Logs to S3 (Subscription Filters - Real-Time Alternative),
- IAM Permissions (policy) Required On the CloudWatch Logs Side,
- Sample bucket policy to allow CloudWatch Logs write On the S3 Bucket Side,
- Best Practices,
- Common Pitfalls,
- Architecture Comparison.
Intro:
- twtech can export CloudWatch Logs to an Amazon S3 bucket using a one-time export task via the console or CLI, or continuously using a subscription filter with Kinesis Firehose.
- The destination S3 bucket must be in the same AWS Region as the CloudWatch log data.
- The IAM user or role performing the export must have sufficient permissions for both CloudWatch Logs and S3.
- The S3 bucket policy must grant CloudWatch Logs permission to write to the bucket.
- This method is suitable for historical data and can be easily initiated through the AWS Management Console.
- Sign in to the AWS Management Console and open the CloudWatch console.
- In the navigation pane, choose Log groups.
- Select the name of the log group twtech wants to export.
- Choose Actions, and then select Export data to Amazon S3.
- On the Export data to Amazon S3 screen, define the From and To time range for the data.
- Under Choose S3 bucket, select the destination S3 bucket name and optionally specify a bucket prefix.
- Choose Export to start the task.
- To view the task's status, choose Actions and then View all exports to Amazon S3.
- For an automated, ongoing solution to export logs as they arrive, use a subscription filter with Kinesis Firehose. This method involves several steps:
- Create a Kinesis Firehose delivery stream with the S3 bucket as its destination.
- Configure a subscription filter on twtech CloudWatch Log Group to send an ongoing stream of log events to the Firehose delivery stream.
- This approach avoids the one active task quota limitation of the one-time export method and allows for more complex processing of the log data before it lands in S3.
- Permissions: Ensure the required IAM policies are in place for the export to succeed.
- Encryption: CloudWatch Logs supports exporting to S3 buckets encrypted with SSE-S3 or SSE-KMS keys.
- Automation: For automated or recurring one-time exports, twtech can use the AWS CLI's
create-export-taskcommand, often within a Lambda function triggered by an EventBridge schedule.
1. Overview
- CloudWatch Logs lets twtech collect logs from EC2, Lambda,
ECS, EKS, on-prem agents, etc.
- Sometimes twtech needs long-term retention or analytics beyond CloudWatch’s native features.
- Exporting logs to Amazon S3 enables integration with:
- Amazon Athena (SQL queries on logs)
- Amazon OpenSearch Service (search & dashboards)
- Amazon EMR / Glue (ETL processing)
- SIEM solutions or custom pipelines
NB:
- CloudWatch Logs → S3 export is not real-time.
- It’s a batch job.
- A batch job is a method of running high-volume, repetitive data processes without manual intervention.
- These jobs are scheduled to run, often during off-peak hours, to handle large datasets, such as processing records or system updates.
2. Ways to Export Logs to S3
(A)
Manual Export Task
- Uses the CreateExportTask API.
- twtech need to specify:
- logGroupName
→ which log group
- from
& to → time range (epoch timestamps, ms)
- destination
→ S3 bucket
- destinationPrefix
→ S3 folder path
- Limitations:
- Max 1 concurrent export task per account per
region
- Max 100 export tasks per account per region per
day
- Time range can’t exceed 24 hours per export
task
- Export is asynchronous (can take minutes–hours
depending on size)
- Output format:
- Logs are gzipped JSON text files
- Stored in S3 partitioned by:
(B)
Subscription Filters (Real-Time Alternative)
- For near real-time streaming to S3, twtech uses Kinesis
Data Firehose as a subscription target.
- Flow:
- CloudWatch Logs → Subscription Filter → Kinesis Data Firehose → S3
- Pros:
- Continuous delivery (seconds to minutes latency)
- Compression, buffering, partitioning support
- Cons:
- Additional cost for Firehose
- Some setup complexity compared to export tasks
3. IAM Permissions Required On the CloudWatch Logs Side:
{
"Effect": "Allow",
"Action": [
"logs:CreateExportTask",
"logs:DescribeExportTasks",
"logs:CancelExportTask"
],
"Resource": "*"
}
Sample bucket policy to allow CloudWatch Logs write On the S3 Bucket Side:
{
"Effect": "Allow",
"Principal": {
"Service":
"logs.amazonaws.com"
},
"Action": "s3:PutObject",
"Resource":
"arn:aws:s3:::twtech-log-archive-bucket/*",
"Condition": {
"StringEquals": {
"aws:SourceAccount":
"accountID"
},
"ArnLike": {
"aws:SourceArn":
"arn:aws:logs:us-east-2:accountID:*"
}
}
}
4. Best Practices
Choose export vs. subscription
carefully
- Use Export Task for historical or bulk
one-off exports.
- Use Firehose Subscription for continuous streaming.
Partition your S3 data wisely
- Use destinationPrefix for business-friendly partitions (e.g., /service/app1/).
- Leverage year/month/day/hour partitioning for Athena queries.
Enable lifecycle policies on S3
- Transition logs to cheaper storage (Glacier, Deep
Archive) for long-term retention.
Use Glue/Athena for queries
- Create Glue tables on top of exported JSON logs.
- Run ad hoc SQL with Athena instead of re-ingesting logs into CloudWatch.
Monitor costs
- Export task API calls are free, but twtech pays for:
- S3 storage
- S3 requests
- Athena queries / Firehose costs (if streaming)
5. Common Pitfalls
- Not real-time:
Export tasks can lag hours for large log groups.
- Daily limit: 100 export tasks per region/account.
- Concurrent limit: Only one active export task at a time per region/account.
- Missing S3 bucket policy: Logs silently fail if CloudWatch Logs can’t PutObject.
- Querying raw gzipped JSON: twtech needs Glue crawlers or Athena schema definition.
6. Architecture Comparison
|
Method |
Latency |
Cost |
Use Case |
|
Export Task (Batch) |
Minutes–Hours |
Low (S3 only) |
Historical exports, archive |
|
Firehose Subscription (RT) |
Seconds–Minutes |
Higher (Firehose + S3) |
Continuous ingestion, analytics |
|
Lambda + S3 |
Near RT |
Higher (Lambda + S3) |
Custom transformation before S3 |
No comments:
Post a Comment