Thursday, December 4, 2025

AWS Storage Strategies | Deep Dive.

A Deep Dive deep into AWS Storage.

Focus:

Tailored for DevOps/DevSecOps./Cloud Engineering.
Hands-on concepts,
Real architectural patterns,
Tuning guidance,
Hybrid/HPC/multi-cloud considerations

Breakdown:

Core AWS Storage Categories,
Data Management & Performance Tuning,
Workload-Specific Architecture Patterns,
Security & Governance,
Storage Architectures by Use Case,
Disaster Recovery Strategy,
Full AWS Storage Architecture Diagram.

1. Core AWS Storage Categories

AWS storage falls under 5 functional domains:

1.1 Object Storage

Amazon S3 (Simple Storage Service)

Designed for massive scale, durability (11 9’s), multi-AZ availability.
Not POSIX compliant.
Best for: Data lakes, ML datasets, backups, logs, HPC staging, container artifacts, static hosting.

Key S3 Features for Engineers

Storage Classes: Standard, IA, One Zone, Glacier Instant/Deep Archive.
Lifecycle Policies → automate tiering & expiration.
Versioning → object-level snapshot history.
Object Lock (WORM) → security/compliance.
S3 Access Points → multi-tenant access patterns.
S3 Multi-Region Access Points → automatic global replication + routing.
S3 Express One Zone → ultra-low-latency object access (~ms-scale).
Multipart Upload / Transfer Acceleration for large files.

1.2 Instance Store

Instance scale to millions of Input/Output Operations Per Second (IOPS), linked to EC2 instance, low latency
An instance store in AWS provides temporary, block-level storage physically attached to the host computer of an Amazon EC2 instance.
This storage is ephemeral, meaning all data stored on it is lost if the instance stops, hibernates, terminates, or if the underlying drive fails.
Key Characteristics
Temporary Storage:
The data does not persist beyond the life of the instance or if the instance is stopped (data does persist through a simple reboot).
High Performance:
Because the storage is physically attached to the host, it offers very low latency and high I/O performance compared to network-attached Amazon EBS volumes.
Cost-Effective:
The cost of the instance store is included in the price of the EC2 instance itself, so there are no separate storage charges.
Instance-Specific:
Not all EC2 instance types offer instance store volumes, and the available size, type (SSD or HDD), and quantity vary by instance type.
No Snapshots:
twtech cannot take a snapshot of an instance store volume, and data is not included in an Amazon Machine Image (AMI) created from the instance.
Common Use Cases
Instance stores are ideal for applications requiring a high-speed, temporary scratchpad or data that can be easily recreated.
Buffering and Caching:
Used for temporary data that requires fast access but does not need long-term retention, such as session caching or image processing pipelines.
Scratch Data:
Ideal for computational workloads that generate temporary data for the duration of a task.
High-Performance Computing (HPC):
Suitable for scenarios that need massive I/O operations per second (IOPS) and can tolerate potential data loss, such as big data processing or analytical workloads.
Replicated Workloads:
Can be used for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers, where individual instance failure does not compromise data availability

1.3 Block Storage

Amazon EBS (Elastic Block Store)

High performance, persistence block volumes attached to EC2.
Single-AZ unless using replication features.

Key Types:

gp3 – General purpose SSD, baseline 3,000 IOPS.
io2/io2 Block Express – Highest durability (99.999%) & IOPS (up to 256K).
st1/sc1 – HDD-based workloads (cold & throughput workloads).

Use Cases:

Databases, EC2 boot disks, Elasticsearch, Kafka brokers, containers needing persistent block.

1.4 File Storage

Amazon EFS (Elastic File System)

NFSv4.1/4.2 compatible, multi-AZ, elastic.
Highly parallel — 10,000s of concurrent connections.

Modes:

Standard / One Zone
Performance Modes: General Purpose, Max I/O

Amazon FSx

Fully managed file systems:

FSx for Lustre → HPC, ML training, large data pipelines.
FSx for NetApp ONTAP → multi-protocol (NFS/SMB/iSCSI), SnapMirror, enterprise use.
FSx for OpenZFS → low latency, ZFS snapshots, compression.
FSx for Windows → SMB for Windows workloads.

1.5 Archival Storage

Glacier Family

Glacier Instant Retrieval – ms access
Glacier Flexible Retrieval – minutes
Glacier Deep Archive – hours

NB:

Used for compliance, cold backups, log retention, disaster recovery.

1.6 Edge & Hybrid Storage

AWS Storage Gateway

File Gateway → NFS/SMB with S3 backend
Tape Gateway → Virtual tapes → S3 Glacier
Volume Gateway → Block replicas from on-prem to AWS

AWS DataSync

High-speed transfer between:

On-prem → AWS
AWS ↔ other cloud
AWS services (S3 ↔ EFS, FSx, etc.)

Snow Family

Snowcone, Snowball Edge, Snowmobile For offline transfer + edge compute at scale.

2. Data Management & Performance Tuning

2.1 S3 Performance

Request parallelization via prefixes:

Modern S3 scales automatically across prefixes (no manual management needed).

Use multipart for >100MB objects.
S3 Select — pushdown filtering for lower transfer costs.

2.2 EBS Performance

Pre-warm io2 volumes for max throughput.
RAID 0 striping across multiple volumes for IOPS-heavy workloads (e.g., MongoDB).
EBS-optimized instances are mandatory for stable latency.

2.3 EFS Performance

Burst Credits matter for sustained workloads.
Use Provisioned Throughput for analytics / ML workloads.
Use One Zone for low latency HPC pipelines.

2.4 FSx Tuning

FSx for Lustre

100s GB/s throughput and ms latency.
Integrates natively with S3 (import/export).
Excellent for ML training, genomics, seismic processing.

3. Workload-Specific Architecture Patterns

3.1 Data Lake / Lakehouse

S3 as central data lake storage.
Glue Catalog / Lake Formation for schema + governance.
Athena for SQL analysis.
Iceberg + S3 for lakehouse tables.
EFS/FSx for ETL staging.

Architecture decisions:

Use S3 Intelligent-Tiering for unpredictable access.
Use S3 Object Lock if regulatory compliance needed.

3.2 HPC (High-Performance Computing) Storage

Typical stack:

FSx for Lustre → primary compute scratch.
S3 → durable dataset repository.
ParallelCluster integrates Lustre natively.
EFA networking for low-latency HPC nodes.

Pattern:

On-prem HPC → DataSync → S3 → FSx for Lustre → HPC Compute → Results → S3

3.3 Hybrid / Multi-Cloud Storage

Patterns:

Pattern A: On-Prem NAS → S3 via File Gateway

Offload backups
Cold archive to Glacier

Pattern B: Multi-Cloud Dataset Sharing

DataSync between AWS ↔ Azure Blob / GCP Storage
S3 Multi-Region Access Points for global apps

Pattern C: DR Across Clouds

FSx/EC2 snapshot replication → S3 → export to another cloud
VMware Cloud on AWS for lift-and-shift workloads

4. Security & Governance

4.1 S3

Encryption: SSE-S3, SSE-KMS, CSE-KMS.
Block Public Access (enable at account level).
VPC endpoints → no public internet traffic.
Bucket policies scoped to IAM conditions like:

aws:PrincipalOrgID
aws:SourceVpc
aws:SourceArn

4.2 EFS/FSx

Enforce access via:

Security Groups
IAM & POSIX permissions
KMS encryption

4.3 Backup

AWS Backup automates protection for:

S3
EFS
FSx
RDS
DynamoDB
EBS

5. Storage Architectures by Use Case

5.1 DevOps / CI/CD

S3 for artifact and container image storage
EFS for shared build caches
FSx for ONTAP for enterprise CI/CD systems requiring snapshots & clones

5.2 Kubernetes / EKS

Recommended:

EBS for per-pod persistent volumes (CSI).
EFS for shared workloads across nodes.
FSx for Lustre for high-performance ML workloads.
FSx for ONTAP for multi-protocol storage.

5.3 Analytics / Big Data

S3 → main Data lake
Redshift → compute warehouse
Glue → ETL
EMR/Hadoop + FSx for Lustre → HPC-scale ETL

6. Disaster Recovery Strategy

Levels:

Warm Standby → S3 cross-region replication + EFS One Zone → Multi-AZ.
Pilot Light → critical S3 + FSx replication only.
Backup & Restore → S3 Glacier as cold storage.
Multi-Region Active-Active → S3 Multi-Region Access Points + DynamoDB Global Tables.