Thursday, December 4, 2025

AWS Storage Strategies | Deep Dive.


A Deep Dive deep into AWS Storage.

Focus:

  •        Tailored for DevOps/DevSecOps./Cloud Engineering.
  •        Hands-on concepts,
  •        Real architectural patterns,
  •        Tuning guidance,
  •        Hybrid/HPC/multi-cloud considerations

Breakdown:

  •        Core AWS Storage Categories,
  •        Data Management & Performance Tuning,
  •        Workload-Specific Architecture Patterns,
  •        Security & Governance,
  •        Storage Architectures by Use Case,
  •        Disaster Recovery Strategy,
  •        Full AWS Storage Architecture Diagram.

1. Core AWS Storage Categories

AWS storage falls under 5 functional domains:

1.1 Object Storage

Amazon S3 (Simple Storage Service)

  •         Designed for massive scale, durability (11 9’s), multi-AZ availability.
  •         Not POSIX compliant.
  •         Best for: Data lakes, ML datasets, backups, logs, HPC staging, container artifacts, static hosting.

Key S3 Features for Engineers

  •        Storage Classes: Standard, IA, One Zone, Glacier Instant/Deep Archive.
  •         Lifecycle Policies automate tiering & expiration.
  •         Versioning object-level snapshot history.
  •         Object Lock (WORM) security/compliance.
  •         S3 Access Points multi-tenant access patterns.
  •         S3 Multi-Region Access Points automatic global replication + routing.
  •         S3 Express One Zone ultra-low-latency object access (~ms-scale).
  •         Multipart Upload / Transfer Acceleration for large files.

1.2  Instance Store

  • Instance scale to millions of Input/Output Operations Per Second (IOPS), linked to EC2 instance, low latency
  • An instance store in AWS provides temporary, block-level storage physically attached to the host computer of an Amazon EC2 instance
  • This storage is ephemeral, meaning all data stored on it is lost if the instance stops, hibernates, terminates, or if the underlying drive fails.
Key Characteristics
Temporary Storage:
  •  The data does not persist beyond the life of the instance or if the instance is stopped (data does persist through a simple reboot).
High Performance: 
  • Because the storage is physically attached to the host, it offers very low latency and high I/O performance compared to network-attached Amazon EBS volumes.
Cost-Effective:
  •  The cost of the instance store is included in the price of the EC2 instance itself, so there are no separate storage charges.
Instance-Specific:
  • Not all EC2 instance types offer instance store volumes, and the available size, type (SSD or HDD), and quantity vary by instance type.
No Snapshots: 
  • twtech cannot take a snapshot of an instance store volume, and data is not included in an Amazon Machine Image (AMI) created from the instance.
Common Use Cases
Instance stores are ideal for applications requiring a high-speed, temporary scratchpad or data that can be easily recreated. 
Buffering and Caching: 
  • Used for temporary data that requires fast access but does not need long-term retention, such as session caching or image processing pipelines.
Scratch Data: 
  • Ideal for computational workloads that generate temporary data for the duration of a task.
High-Performance Computing (HPC): 
  • Suitable for scenarios that need massive I/O operations per second (IOPS) and can tolerate potential data loss, such as big data processing or analytical workloads.
Replicated Workloads:
  • Can be used for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers, where individual instance failure does not compromise data availability

1.3 Block Storage

Amazon EBS (Elastic Block Store)

  •         High performance, persistence block volumes attached to EC2.
  •         Single-AZ unless using replication features.

Key Types:

  •        gp3 – General purpose SSD, baseline 3,000 IOPS.
  •         io2/io2 Block Express – Highest durability (99.999%) & IOPS (up to 256K).
  •         st1/sc1 – HDD-based workloads (cold & throughput workloads).

Use Cases:

  •         Databases, EC2 boot disks, Elasticsearch, Kafka brokers, containers needing persistent block.

1.4 File Storage

Amazon EFS (Elastic File System)

  •         NFSv4.1/4.2 compatible, multi-AZ, elastic.
  •         Highly parallel — 10,000s of concurrent connections.

Modes:

  •         Standard / One Zone
  •         Performance Modes: General Purpose, Max I/O

Amazon FSx

Fully managed file systems:

  •         FSx for Lustre HPC, ML training, large data pipelines.
  •         FSx for NetApp ONTAP multi-protocol (NFS/SMB/iSCSI), SnapMirror, enterprise use.
  •         FSx for OpenZFS low latency, ZFS snapshots, compression.
  •         FSx for Windows SMB for Windows workloads.

1.5 Archival Storage

Glacier Family

  •         Glacier Instant Retrieval – ms access
  •         Glacier Flexible Retrieval – minutes
  •         Glacier Deep Archive – hours

NB:

Used for compliance, cold backups, log retention, disaster recovery.

1.6 Edge & Hybrid Storage

AWS Storage Gateway

  •         File Gateway NFS/SMB with S3 backend
  •         Tape Gateway Virtual tapes S3 Glacier
  •         Volume Gateway Block replicas from on-prem to AWS

AWS DataSync

  •         High-speed transfer between:
    •    On-prem AWS
    •    AWS other cloud
    •    AWS services (S3 EFS, FSx, etc.)

Snow Family

  •         Snowcone, Snowball Edge, Snowmobile For offline transfer + edge compute at scale.

 2. Data Management & Performance Tuning

2.1 S3 Performance

  •         Request parallelization via prefixes:
    •    Modern S3 scales automatically across prefixes (no manual management needed).
  •         Use multipart for >100MB objects.
  •         S3 Select — pushdown filtering for lower transfer costs.

2.2 EBS Performance

  •         Pre-warm io2 volumes for max throughput.
  •         RAID 0 striping across multiple volumes for IOPS-heavy workloads (e.g., MongoDB).
  •         EBS-optimized instances are mandatory for stable latency.

2.3 EFS Performance

  •         Burst Credits matter for sustained workloads.
  •         Use Provisioned Throughput for analytics / ML workloads.
  •         Use One Zone for low latency HPC pipelines.

2.4 FSx Tuning

FSx for Lustre

  •         100s GB/s throughput and ms latency.
  •         Integrates natively with S3 (import/export).
  •         Excellent for ML training, genomics, seismic processing.

 3. Workload-Specific Architecture Patterns

3.1 Data Lake / Lakehouse

  •         S3 as central data lake storage.
  •         Glue Catalog / Lake Formation for schema + governance.
  •         Athena for SQL analysis.
  •         Iceberg + S3 for lakehouse tables.
  •         EFS/FSx for ETL staging.

Architecture decisions:

  •         Use S3 Intelligent-Tiering for unpredictable access.
  •         Use S3 Object Lock if regulatory compliance needed.

3.2 HPC (High-Performance Computing) Storage

Typical stack:

  •         FSx for Lustre primary compute scratch.
  •         S3 durable dataset repository.
  •         ParallelCluster integrates Lustre natively.
  •         EFA networking for low-latency HPC nodes.

Pattern:

On-prem HPC  DataSync  S3  FSx for Lustre  HPC Compute  Results  S3

3.3 Hybrid / Multi-Cloud Storage

Patterns:

Pattern A: On-Prem NAS S3 via File Gateway

  •         Offload backups
  •         Cold archive to Glacier

Pattern B: Multi-Cloud Dataset Sharing

  •         DataSync between AWS Azure Blob / GCP Storage
  •         S3 Multi-Region Access Points for global apps

Pattern C: DR Across Clouds

  •         FSx/EC2 snapshot replication S3 export to another cloud
  •         VMware Cloud on AWS for lift-and-shift workloads

 4. Security & Governance

4.1 S3

  •         Encryption: SSE-S3, SSE-KMS, CSE-KMS.
  •         Block Public Access (enable at account level).
  •         VPC endpoints no public internet traffic.
  •         Bucket policies scoped to IAM conditions like:
    •    aws:PrincipalOrgID
    •    aws:SourceVpc
    •    aws:SourceArn

4.2 EFS/FSx

  •         Enforce access via:
    •    Security Groups
    •    IAM & POSIX permissions
    •    KMS encryption

4.3 Backup

  •         AWS Backup automates protection for:
    •    S3
    •    EFS
    •    FSx
    •    RDS
    •    DynamoDB
    •    EBS

 5. Storage Architectures by Use Case

5.1 DevOps / CI/CD

  •         S3 for artifact and container image storage
  •         EFS for shared build caches
  •         FSx for ONTAP for enterprise CI/CD systems requiring snapshots & clones

5.2 Kubernetes / EKS

Recommended:

  •         EBS for per-pod persistent volumes (CSI).
  •         EFS for shared workloads across nodes.
  •         FSx for Lustre for high-performance ML workloads.
  •         FSx for ONTAP for multi-protocol storage.

5.3 Analytics / Big Data

  •         S3 main Data lake
  •         Redshift compute warehouse
  •         Glue ETL
  •         EMR/Hadoop + FSx for Lustre HPC-scale ETL

 6. Disaster Recovery Strategy

Levels:

  •         Warm Standby S3 cross-region replication + EFS One Zone Multi-AZ.
  •         Pilot Light critical S3 + FSx replication only.
  •         Backup & Restore S3 Glacier as cold storage.
  •         Multi-Region Active-Active S3 Multi-Region Access Points + DynamoDB Global Tables.

No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...