Thursday, December 4, 2025

AWS Storage Strategies | Deep Dive.


AWS Storage Strategies - Deep Dive.

Scope:

  • Intro,       
  • Core AWS Storage Categories,
  • Functional domains-1: Object Storage,
  • Functional domains-2: 

    Instance Store,
  • Functional domains-3: 

    Block Storage,
  • Functional domains-4:

     File Storage,
  • Functional domains-4:

     Archival Storage,
  • Data Management & Performance Tuning,
  • Workload-Specific Architecture Patterns,
  • Security & Governance,
  • Storage Architectures by Use Case,
  • Disaster Recovery Strategy,
  • Full AWS Storage Architecture Diagram.

1. Core AWS Storage Categories

  • AWS storage falls under 5 functional domains:

Functional domains-1:
 

Object Storage

Amazon S3 (Simple Storage Service)

    • Designed for:
      • Massive scale, 
      • Durability (11 9’s), 
      • Multi-AZ availability.
    • Not POSIX compliant.
    • Best for: 
      • Data lakes, 
      • ML datasets, 
      • backups, 
      • logs, 
      • HPC staging, 
      • container artifacts, 
      • static hosting.

Key S3 Features for Engineers

    • Storage Classes: Standard, IA, One Zone, Glacier Instant/Deep Archive.
    • Lifecycle Policies automate tiering & expiration.
    • Versioning object-level snapshot history.
    • Object Lock (WORM) security/compliance.
    • S3 Access Points multi-tenant access patterns.
    • S3 Multi-Region Access Points automatic global replication + routing.
    • S3 Express One Zone ultra-low-latency object access (~ms-scale).
    • Multipart Upload / Transfer Acceleration for large files.

Functional domains-2: 

Instance Store

    • Instance scale to millions of Input/Output Operations Per Second (IOPS), linked to EC2 instance, low latency
    • An instance store in AWS provides temporary, block-level storage physically attached to the host computer of an Amazon EC2 instance
    • This storage is ephemeral, meaning all data stored on it is lost if the instance stops, hibernates, terminates, or if the underlying drive fails.
Key Characteristics
Temporary Storage:
    •  The data does not persist beyond the life of the instance or if the instance is stopped (data does persist through a simple reboot).
High Performance: 
    • Because the storage is physically attached to the host, it offers very low latency and high I/O performance compared to network-attached Amazon EBS volumes.
Cost-Effective:
    •  The cost of the instance store is included in the price of the EC2 instance itself, so there are no separate storage charges.
Instance-Specific:
    • Not all EC2 instance types offer instance store volumes, and the available size, type (SSD or HDD), and quantity vary by instance type.
No Snapshots: 
    • twtech cannot take a snapshot of an instance store volume, and data is not included in an Amazon Machine Image (AMI) created from the instance.
Common Use Cases
    • Instance stores are ideal for applications requiring a high-speed, temporary scratchpad or data that can be easily recreated. 
Buffering and Caching: 
    • Used for temporary data that requires fast access but does not need long-term retention, such as session caching or image processing pipelines.
Scratch Data: 
    • Ideal for computational workloads that generate temporary data for the duration of a task.
High-Performance Computing (HPC): 
    • Suitable for scenarios that need massive I/O operations per second (IOPS) and can tolerate potential data loss, such as big data processing or analytical workloads.
Replicated Workloads:
    • Can be used for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers, where individual instance failure does not compromise data availability

Functional domains-3: 

 Block Storage

Amazon EBS (Elastic Block Store)

    •  High performance, persistence block volumes attached to EC2.
    •  Single-AZ unless using replication features.

Key Types:

    • gp3 – General purpose SSD, baseline 3,000 IOPS.
    • io2/io2 Block Express – Highest durability (99.999%) & IOPS (up to 256K).
    • st1/sc1 – HDD-based workloads (cold & throughput workloads).

Use Cases:

    • Databases, EC2 boot disks, Elasticsearch, Kafka brokers, containers needing persistent block.

Functional domains-4:

 File Storage

Amazon EFS (Elastic File System)

    • NFSv4.1/4.2 compatible, multi-AZ, elastic.
    • Highly parallel — 10,000s of concurrent connections.

Modes:

    • Standard / One Zone
    • Performance Modes: General Purpose, Max I/O

Amazon FSx

Fully managed file systems:

    • FSx for Lustre HPC, ML training, large data pipelines.
    • FSx for NetApp ONTAP multi-protocol (NFS/SMB/iSCSI), SnapMirror, enterprise use.
    • FSx for OpenZFS low latency, ZFS snapshots, compression.
    •  FSx for Windows SMB for Windows workloads.

Functional domains-4:

 Archival Storage

Glacier Family

    • Glacier Instant Retrieval – ms access
    • Glacier Flexible Retrieval – minutes
    • Glacier Deep Archive – hours

NB:

    • Used for compliance, cold backups, log retention, disaster recovery.

1.6 Edge & Hybrid Storage

AWS Storage Gateway

    • File Gateway NFS/SMB with S3 backend
    • Tape Gateway Virtual tapes S3 Glacier
    • Volume Gateway Block replicas from on-prem to AWS

AWS DataSync

    • High-speed transfer between:
      •    On-prem AWS
      •    AWS other cloud
      •    AWS services (S3 EFS, FSx, etc.)

Snow Family

    • Snowcone, Snowball Edge, Snowmobile For offline transfer + edge compute at scale.

 2. Data Management & Performance Tuning

2.1 S3 Performance

    • Request parallelization via prefixes:
      •   Modern S3 scales automatically across prefixes (no manual management needed).
    • Use multipart for >100MB objects.
    • S3 Select pushdown filtering for lower transfer costs.

2.2 EBS Performance

    • Pre-warm io2 volumes for max throughput.
    • RAID 0 striping across multiple volumes for IOPS-heavy workloads (e.g., MongoDB).
    • EBS-optimized instances are mandatory for stable latency.

2.3 EFS Performance

    • Burst Credits matter for sustained workloads.
    • Use Provisioned Throughput for analytics / ML workloads.
    • Use One Zone for low latency HPC pipelines.

2.4 FSx Tuning

FSx for Lustre

    •  100s GB/s throughput and ms latency.
    •  Integrates natively with S3 (import/export).
    •  Excellent for ML training, genomics, seismic processing.

 3. Workload-Specific Architecture Patterns

3.1 Data Lake / Lakehouse

    • S3 as central data lake storage.
    • Glue Catalog / Lake Formation for schema + governance.
    •  Athena for SQL analysis.
    •  Iceberg + S3 for lakehouse tables.
    •  EFS/FSx for ETL staging.

Architecture decisions:

    •  Use S3 Intelligent-Tiering for unpredictable access.
    •  Use S3 Object Lock if regulatory compliance needed.

3.2 HPC (High-Performance Computing) Storage

Typical stack:

    •  FSx for Lustre primary compute scratch.
    •  S3 durable dataset repository.
    •  ParallelCluster integrates Lustre natively.
    •  EFA networking for low-latency HPC nodes.

Pattern:

On-prem HPC  DataSync  S3  FSx for Lustre  HPC Compute  Results  S3

3.3 Hybrid / Multi-Cloud Storage

Patterns:

Pattern A: On-Prem NAS S3 via File Gateway

    • Offload backups
    • Cold archive to Glacier

Pattern B: Multi-Cloud Dataset Sharing

    •  DataSync between AWS Azure Blob / GCP Storage
    •  S3 Multi-Region Access Points for global apps

Pattern C: DR Across Clouds

    •  FSx/EC2 snapshot replication S3 export to another cloud
    •  VMware Cloud on AWS for lift-and-shift workloads

 4. Security & Governance

4.1 S3

    •  Encryption: SSE-S3, SSE-KMS, CSE-KMS.
    •  Block Public Access (enable at account level).
    •  VPC endpoints no public internet traffic.
    •  Bucket policies scoped to IAM conditions like:
      •    aws:PrincipalOrgID
      •    aws:SourceVpc
      •    aws:SourceArn

4.2 EFS/FSx

    • Enforce access via:
      •    Security Groups
      •    IAM & POSIX permissions
      •    KMS encryption

4.3 Backup

  • AWS Backup automates protection for:
    •    S3
    •    EFS
    •    FSx
    •    RDS
    •    DynamoDB
    •    EBS

 5. Storage Architectures by Use Case

5.1 DevOps / CI/CD

    • S3 for artifact and container image storage
    • EFS for shared build caches
    • FSx for ONTAP for enterprise CI/CD systems requiring snapshots & clones

5.2 Kubernetes / EKS

Recommended:

    • EBS for per-pod persistent volumes (CSI).
    • EFS for shared workloads across nodes.
    • FSx for Lustre for high-performance ML workloads.
    • FSx for ONTAP for multi-protocol storage.

5.3 Analytics / Big Data

    • S3 main Data lake
    • Redshift compute warehouse
    • Glue ETL
    • EMR/Hadoop + FSx for Lustre HPC-scale ETL

 6. Disaster Recovery Strategy

Levels:

    • Warm Standby S3 cross-region replication + EFS One Zone Multi-AZ.
    • Pilot Light critical S3 + FSx replication only.
    • Backup & Restore S3 Glacier as cold storage.
    • Multi-Region Active-Active S3 Multi-Region Access Points + DynamoDB Global Tables.




No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...