Amazon S3 Batch Operations – Overview & Performance
Amazon S3 Batch Operations let twtech to perform actions on millions or billions of S3
objects with a single request. It’s ideal for large-scale automation,
reducing the complexity and cost of running repetitive tasks across your data
at scale.
The concept: S3 Batch Operations
S3 Batch Operations is a managed
feature that lets you perform operations like:
Operation
Type |
Description |
PUT Object Tagging |
Add or change tags on objects. |
COPY |
Copy objects within/between
buckets. |
Invoke AWS Lambda |
Run custom code on each object. |
Restore |
Restore Glacier/Deep Archive
objects. |
PUT Object ACL |
Change access control lists
(ACLs). |
PUT Object Legal Hold |
Apply legal hold to objects. |
Initiate Multipart Upload |
Prepares large files for upload. |
How S3 Batch Operations Works
- Manifest File
- A CSV or JSON file in S3 listing the target objects
(bucket, key, version).
- Can be generated manually or using S3 Inventory.
- Batch Job
- twtech configure the job (action, IAM role, completion
report, etc.).
- S3 reads the manifest and executes the job.
- Monitoring & Reporting
- twtech can track progress via the S3 console or AWS CLI.
- Completion reports can include success/failure logs.
Key Features
Feature |
Benefit |
Scalable |
Handles billions of objects. |
Eventual Execution |
Jobs may take time depending on
size but are reliable. |
Retries & Logging |
Failed operations are retried and
logged. |
Custom Processing |
Use AWS Lambda for
filtering, format changes, tagging logic, etc. |
No Code Required |
GUI-driven or CLI/SDK managed. |
Example: Create a Batch Job to Add Tags
# bash
aws
s3control create-job \
--account-id 12345xxxxxxx \
--operation '{"S3PutObjectTagging":{"TagSet":[{"Key":"Processed","Value":"True"}]}}'
\
--report '{"Bucket":"arn:aws:s3:::twtech-report-bucket","Format":"Report_CSV_20180820","Enabled":true}'
\
--manifest '{"Spec":{"Format":"S3BatchOperations_CSV_20180820","Fields":["twtect-s3bucket","Key"]},"Location":{"ObjectArn":"arn:aws:s3:::twtech-batch-job/manifest.csv","ETag":"<ETag>"}}'
\
--priority 1 \
--role-arn arn:aws:iam::123456xxxxxxx:role/S3BatchOpsRole
\
--region us-east-2
Performance
Considerations
Metric |
Details |
Scalability |
Petabyte-scale, billions of
objects. |
Concurrency |
Automatically managed by AWS. |
Throughput |
Varies by operation (e.g., Lambda
concurrency, S3 limits). |
Latency |
Jobs may take minutes to hours
based on size. |
Billing |
Charged per object processed, and
for requests triggered (e.g., Lambda invocations, PUTs). |
twtech Best
Practices
- Use S3 Inventory to generate large object
manifests.
- Limit Lambda payload sizes and use streaming for
large objects.
- Test jobs on small samples before full-scale runs.
- Always use completion reports to audit job
results.
Assign appropriate IAM roles and permissions
to batch jobs
No comments:
Post a Comment