Tuesday, August 26, 2025

Amazon S3 | Deep Dive.

 Amazon S3 (Simple Storage Service) Deep Dive.

Scope        

  • Architecture,          
  • Storage classes,          
  • Security,         
  • Best practices,          
  • Integrations,            
  • Advanced features.

1. Overview of Amazon S3

Amazon S3 is an object storage service that provides highly durable, scalable, and secure storage for any type of data — from backups and logs to big data analytics and media files.

  • Key features:
    • Durability: 99.999999999% (11 9’s) over a given year.
    • Availability: 99.99% SLA for standard storage.
    • Scalability: Virtually unlimited storage.
    • Object-based: Data is stored as objects, each with metadata and a unique key.
    • Global namespace: Each object is stored in a bucket, which is globally unique per AWS account.

2. S3 Architecture

S3’s architecture is flat (no directories in the traditional sense) but uses a key-based hierarchy for organizational purposes.

Key components:

  • Buckets: Containers for objects. Unique name across AWS.
  • Objects: The data itself (file) + metadata + key.
  • Keys: Unique identifier for each object in a bucket.
  • Regions: Buckets are region-specific.
  • Endpoints: S3 supports REST API, SDKs, and CLI access.

S3 Storage Flow:

  1. Client uploads an object via PUT request.
  2. S3 stores the object redundantly across multiple Availability Zones (AZs).
  3. S3 automatically maintains data integrity using checksums.
  4. Objects can trigger events (like Lambda functions) via S3 Event Notifications.

3. S3 Storage Classes

  • AWS provides multiple storage classes based on access patterns, durability, and cost.

Storage Class

Use Case

Availability

Cost

S3 Standard

Frequent access

99.99%

High

S3 Intelligent-Tiering

Automatic tiering based on access patterns

99.9%

Medium

S3 Standard-IA (Infrequent Access)

Less frequent access, but rapid retrieval needed

99.9%

Lower

S3 One Zone-IA

Data can be lost if AZ fails

99.5%

Lower

S3 Glacier

Archival, retrieval in minutes/hours

N/A

Very Low

S3 Glacier Deep Archive

Long-term archival, retrieval in hours

N/A

Lowest

4. S3 Security & Access Control

Security is multi-layered:

Access Management

  • IAM Policies: Control user and role permissions.
  • Bucket Policies: Define access rules at the bucket level.
  • ACLs (Access Control Lists): Legacy, less recommended for fine-grained control.

Encryption

  • Server-Side Encryption (SSE):
    • SSE-S3: Managed by AWS.
    • SSE-KMS: Managed by KMS (allows key rotation).
    • SSE-C: Customer-provided keys.
  • Client-Side Encryption: Encrypt before uploading.

Networking

  • VPC endpoints (Gateway/Interface): Secure private connectivity.
  • Block Public Access: Prevent accidental public exposure.

5. S3 Features

  • Versioning: Keep multiple versions of an object.
  • Lifecycle Policies: Move objects between storage classes or expire them.
  • Replication: Cross-Region Replication (CRR) or Same-Region Replication (SRR).
  • Event Notifications: Trigger Lambda, SQS, or SNS on object events.
  • Pre-signed URLs: Temporary access to objects.
  • Object Lock & WORM: Compliance storage to prevent deletion (useful for regulatory requirements).

6. Integrations

S3 integrates seamlessly with many AWS services:

  • Compute: Lambda, EC2, ECS
  • Analytics: Athena, EMR, Redshift Spectrum
  • Data Processing: Glue, Kinesis Data Firehose
  • Serverless API: API Gateway + Lambda + S3 for static content
  • Workflow Orchestration: Step Functions, EventBridge

7. Best Practices

  1. Enable versioning for critical data.
  2. Use lifecycle rules to save costs.
  3. Encrypt data at rest and in transit.
  4. Monitor access with CloudTrail & S3 Access Logs.
  5. Avoid public access unless absolutely required.
  6. Use intelligent-tiering for unpredictable access patterns.
  7. Design key names to avoid hot partitions (use hashed prefixes for high throughput).

8. Advanced Concepts

  • Multipart Upload: For large files, upload in parts to improve speed and reliability.
  • S3 Batch Operations: Perform bulk actions on millions of objects.
  • Requester Pays: The requester pays for access costs, not the bucket owner.
  • S3 Select & Glacier Select: Query objects using SQL without full download.
  • Event-Driven Architecture: Lambda triggers on PUT, POST, DELETE.

9. Use Cases

  • Static website hosting
  • Backup & disaster recovery
  • Data lake for analytics
  • Media storage (images/videos)
  • Log storage & archival
  • Big data processing

step by step on creating an Amazon S3 bucket and integrating it with other AWS services like Lambda, API Gateway, EventBridge, and Step Functions. I’ll also explain practical use cases for each integration.

1. Create an S3 Bucket

Using AWS Console

  1. Go to S3 Create bucket.
  2. Enter:
    • Bucket name: Must be globally unique.
    • Region: Choose the region closest to twtech users.
  3. Configure options:
    • Versioning: Enable if twtech wants to track object versions.
    • Encryption: Enable SSE-S3 or SSE-KMS.
    • Block Public Access: Usually keep all blocked unless hosting a public website.
  4. Click Create bucket.

Using AWS CLI

# bash

aws s3api create-bucket \

    --bucket twtech-s3bucket \

    --region us-east-2 \

    --create-bucket-configuration LocationConstraint=us-east-2

2. Integrate S3 with Lambda

Use case: 

  • Automatically process files when uploaded (e.g., image resizing, CSV processing).

Steps:

  1. Create Lambda Function:
    • Runtime: Python/Node.js/Java/Go.
    • Role: Attach a policy allowing s3:GetObject and s3:PutObject.
  2. Create S3 Event Notification:
    • Go to your bucket Properties Event notifications Create event notification.
    • Event type: All object create events (or specific: PUT/POST).
    • Destination: Lambda function.
  3. Lambda Example (Python):

import boto3

def lambda_handler(event, context):

    s3 = boto3.client('s3')

    bucket = event['Records'][0]['s3']['bucket']['twtech-s3bucket']

    key = event['Records'][0]['s3']['object']['key']

    print(f"New file uploaded: {key} in bucket {twtech-new-s3bucket}")

3. Integrate S3 with API Gateway

Use case:

  • Expose files or allow file uploads via API endpoints.

Steps:

  1. Create API Gateway REST API.
  2. Add a POST method Integration type: Lambda.
  3. Lambda function receives base64-encoded file stores in S3:

import boto3

import base64

def lambda_handler(event, context):

    s3 = boto3.client('s3')

    file_content = base64.b64decode(event['body'])

    s3.put_object(Bucket='twtech-s3bucket', Key='uploaded_file.txt', Body=file_content)

    return {"statusCode": 200, "body": "twtech File uploaded successfully"}

4. Integrate S3 with EventBridge

Use case: Trigger workflows or notifications for S3 events.

Steps:

  1. Go to EventBridge → Create rule.
  2. Event pattern:

{

  "source": ["aws.s3"],

  "detail-type": ["Object Created"]

}

  1. Target: Lambda, Step Functions, SNS, or SQS.
  2. Now, every time a file is uploaded to S3, EventBridge triggers downstream services.

5. Integrate S3 with Step Functions

Use case: 

  • Orchestrate multi-step workflows like ETL pipelines.

Steps:

  1. Create a Step Functions state machine.
  2. Define states:
    • Start Lambda (process S3 object)
    • Lambda Another Lambda or service
    • End
  3. Trigger state machine from:
    • S3 Event Notification Lambda Step Function
    • Or EventBridge rule directly.

Step Functions example (JSON):

{

  "StartAt": "ProcessS3Object",

  "States": {

    "ProcessS3Object": {

      "Type": "Task",

      "Resource": "arn:aws:lambda:us-east-2:accountID:function:ProcessS3",

      "End": true

    }

  }

}

6. S3 Integration Architecture Sample


A typical workflow:

User uploads file S3 bucket triggers Lambda Step Functions workflow stores results back in S3 / sends notification via SNS

Optional

  • EventBridge can listen to S3 events to trigger workflows without Lambda.

twtech-insights:

  • Here’s twtech comprehensive breakdown of Amazon S3 (Simple Storage Service) tiers
  •  Including: 
    • key features, 
    • security, 
    • encryption, 
    • batch operations, 
    • performance, 
    • automation,  
    • use cases,

1. S3 Storage Tiers / Classes

Tier / Class

Key Features

Cost Focus

Typical Use Case

S3 Standard

High durability (99.999999999%), low latency, high throughput, 99.99% availability

Frequent access

Websites, content distribution, apps, big data analytics

S3 Intelligent-Tiering

Automatic cost optimization by moving objects between access tiers based on usage

Frequent & infrequent access

Unknown or changing access patterns, dynamic workloads

S3 Standard-IA (Infrequent Access)

Lower storage cost than Standard, higher retrieval cost, same durability

Infrequent access

Backup & disaster recovery, long-term storage of infrequently accessed data

S3 One Zone-IA

Stores data in a single AZ, lower cost, same retrieval cost as Standard-IA

Infrequent access, non-critical

Secondary backups, easily reproducible data

S3 Glacier

Very low cost, retrieval from minutes to hours

Archival storage

Long-term backups, compliance, media archives

S3 Glacier Deep Archive

Lowest cost tier, retrieval in 12–48 hours

Long-term archival

Regulatory archives, rarely accessed data, cold storage

S3 Outposts

On-premises object storage

Local data residency

Low-latency, hybrid cloud workloads

2. Key Features

  • Durability: 11 nines (99.999999999%) across all classes
  • Scalability: Automatically scales with data volume
  • Versioning: Keep multiple versions of objects
  • Lifecycle Policies: Automate moving objects between tiers
  • Replication: Cross-region (CRR) or same-region replication (SRR)
  • Event Notifications: Trigger Lambda, SQS, or SNS on events
  • Access Management: IAM policies, bucket policies, ACLs

3. Security

  • Access Control: IAM, bucket policies, ACLs
  • Encryption:
    • At rest: SSE-S3, SSE-KMS, SSE-C
    • In transit: TLS/HTTPS
  • Logging & Monitoring: CloudTrail, S3 server access logs, Amazon Macie for sensitive data
  • Block Public Access: Prevent accidental public exposure

4. Encryption

Type

Description

SSE-S3

AWS manages encryption keys, AES-256

SSE-KMS

Customer-managed keys in KMS, audit trail

SSE-C

Customer provides keys, AWS does not store them

Client-Side Encryption

Data encrypted before uploading to S3

5. Batch Operations

  • Perform operations across millions/billions of objects
  • Tasks include:
    • Copying objects
    • Tagging objects
    • Initiating Glacier restores
    • Running AWS Lambda functions on objects

6. Performance

  • High throughput: Can handle thousands of requests per second
  • Optimizations:
    • Prefixes no longer required for high performance (S3 auto-scales)
    • Multipart upload for large files (>100 MB recommended)
    • S3 Transfer Acceleration for faster global uploads

7. Automation

  • Lifecycle Policies: Move objects automatically between tiers
  • Replication Rules: Automatic replication for DR or compliance
  • Event-Driven Automation: S3 triggers Lambda for processing files
  • S3 Object Lock: Compliance automation (WORM storage)

8. Use Cases by Tier

Tier

Use Cases

Standard

Streaming media, dynamic content, analytics

Intelligent-Tiering

Changing access patterns, cost optimization without manual intervention

Standard-IA / One Zone-IA

Backup, disaster recovery, secondary data copies

Glacier / Glacier Deep Archive

Regulatory/compliance archives, infrequent backups, media archiving

Outposts

Low-latency local workloads, hybrid cloud storage

🔑 Key Features

Versioning (keep multiple object versions) 

Lifecycle Policies (auto-move between tiers) 

Replication (CRR / SRR) 

Event Notifications (Lambda, SNS, SQS triggers) 

High durability & scalability (11 9's) 

🛡 Security & Encryption

🔒 Security: IAM, Bucket Policies, ACLs, Block Public Access, CloudTrail 

🗝 Encryption: SSE-S3, SSE-KMS, SSE-C, Client-Side Encryption 

🔐 In-transit: TLS/HTTPS 

⚙️ Batch Operations

📋 Copy objects 

🏷 Add/modify tags 

⏪ Restore Glacier objects 

🖥 Run Lambda functions on large object sets 

🚀 Performance & Automation

⚡ Multipart Uploads for large files 

🌍 Transfer Acceleration for global uploads 

🔄 Lifecycle Policies for auto-tiering 

📑 Replication for DR & compliance 

🔔 Event-driven Lambda automation 

⛓ Object Lock for compliance 

🎯 Use Cases

💻 Standard: Dynamic content, apps, analytics 

🤖 Intelligent-Tiering: Changing access patterns, cost optimization 

💾 Standard-IA / One Zone-IA: Backup, disaster recovery 

📦 Glacier / Deep Archive: Regulatory & media archives 

🏢 Outposts: Low-latency hybrid or on-prem workloads 




No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...