Sunday, November 30, 2025

Transferring Large Amount of Data into AWS | Overview.

Transferring Large Amount of Data into AWS - Overview.

Scope:

  • Intro,
  • Online Network Transfer,
  • AWS DataSync Pros & Use Case,
  • AWS Transfer FamilyPros & Use Case,
  • Amazon S3 Transfer Acceleration Pros & Use Case,
  • AWS Command Line Interface (CLI) / SDKs Pros & Use Case,
  • AWS Direct Connect SDKs Pros & Use Case,
  • Offline Physical Transfer Devices (When Network Is Too Slow),
  • AWS Snowball Edge & Best Use Cases,
  • AWS Snowcone & Best Use Cases,
  • Hybrid Cloud Replication Services,
  • VM, Backup, and Large-Scale Migration Tools,
  • Advanced Performance Optimization Techniques (End-to-End Flow into AWS),
  • Choosing the Right Method (Data Volume, Network Available Best Option)
  • Security,
  • Cost Considerations.

Intro:

    • Transferring large amounts of data into AWS can be done through online and offline methods.
    • Transferring large amounts of data into AWS depends primarily on the:
      • Data volume, 
      • Available network bandwidth, 
      • Time constraints (How quick transfer is need).
Online Transfer Methods

NB:

    • These methods use twtech existing network connection to move data to AWS
 AWS DataSync, Pros & Use Case:

    • A managed file transfer service designed for automating and accelerating data movement between on-premises storage (NFS, SMB shares, Hadoop, etc.) and other AWS storage services (Amazon S3, EFS, FSx).
   Pros:
    • It Can be up to 10x faster than open-source tools; 
    • Handles many tasks automatically, 
    • Including data integrity validation and encryption.
   Best for:
    • One-time migrations, 
    • Recurring data processing, 
    • Automated replication when twtech has available network bandwidth.
 AWS Transfer Family Pros & Use Case:
    • Provides fully managed support for transferring files into and out of Amazon S3 using standard file transfer protocols (SFTP, FTPS, and FTP).
   Best for:
    • Seamlessly migrating existing file transfer workflows that rely on these protocols without changing client-side configurations.
 Amazon S3 Transfer Acceleration Pros & Use Case:
    • This feature optimizes public internet transfers to Amazon S3.
    • By leveraging Amazon CloudFront's global edge locations to minimize the effect of latency and maximize available bandwidth.
   Best for:
    • Accelerating uploads when using the public internet and dealing with high latency over long distances.
 AWS Command Line Interface (CLI) / SDKs Pros & Use Case:
    • For manual or scripted transfers, 
    • The AWS CLI provides s3 cp or s3 sync commands, 
    • Which can be tuned to use multipart uploads for large files.
Best for:
    • Users who require scripting capabilities and can manage network optimization manually.
AWS Direct Connect SDKs Pros & Use Case:
    • Establishes a dedicated private network connection from on-premises to AWS.
    • This dedicated private network connection provides a more consistent and higher-throughput experience than an internet connection.
Best for:
    • High-throughput, 
    • Reliable, 
    • Secure data transfer for ongoing hybrid cloud operations
Offline Transfer Methods (AWS Snow Family) 
    • when twtech has extremely large datasets (terabytes to petabytes), limited network bandwidth or no network bandwidth, or the data is not needed immediately, AWS provides physical storage devices. 
 AWS Snowball Edge & Best Use Cases:
    • A rugged, 
    • Secure device with significant storage (up to 210 TB NVMe SSD) 
    • With optional compute capabilities.
        Process:
    • twtech orders a device via the AWS Console, 
    • copy its data to it on-premises, 
    • And ship it back to AWS, 
    • where the data is loaded into twtech S3 bucket.
        Best for:
    • Bulk data migrations (terabytes to petabytes).
    • To make shipping of data faster, 
    • Its more cost-effective large data than online transfer.
 AWS Snowcone & Best Use Cases:
    • The smallest member of the Snow Family, 
    • offering up to 8 TB of usable storage per device.                                                                                                               
        Best for:
    •  Smaller data migration requirements 
    • or edge computing in remote locations.
AWS Snowmobile  & Best Use Cases:
    • A literal semi-trailer for exabyte-scale data migration.
        Best for:

    • Extremely large, 
    • multi-petabyte/exabyte-scale data center migrations.

There are the 5 primary categories:

    1. Online Network Transfer (Direct Connect, VPN, Internet)
    2. Optimized Transfer Services (AWS DataSync, S3 Transfer Acceleration)
    3. Physical Offline Devices (Snowcone, Snowball, Snowmobile)
    4. Hybrid Software Replication (Storage Gateway, FSx, RDS/Aurora tools)
    5.  Application-Level Migration (Databases, VMs, Files, Streams)

1. Online Network Transfer

1.1 Direct Connect (DX)

Best for: 

    • Predictable, 
    • High-bandwidth, 
    • Petabyte-scale continuous transfer.

Capacities:

    •  Dedicated DX: 1 Gbps, 10 Gbps, 100 Gbps
    •  Hosted DX: 50 Mbps – 10 Gbps

Use cases:

    • Datacenter AWS data ingestion
    • Real-time replication
    • Long-term hybrid architectures

Throughput:

    • With TCP tuning (window size, parallel streams), 
    • twtech can achieve:
      • ~70–90% of line-rate on optimized circuits.

Deep considerations:

    • Use Jumbo Frames (MTU 9001)
    • Tune TCP window size > 16 MB
    • Use parallel transfers for S3
    • Combine with AWS DataSync for protocol optimization

1.2 Site-to-Site VPN

  • Used when DX doesn't exist or for temporary migrations.
    • 1.25 Gbps per VPN tunnel (usually less, ~300–400 Mbps real-world)
    • Can use Equal-Cost Multi-Path (ECMP) to parallelize multiple tunnels

NB:

    • Not ideal for PB-scale, but workable for incremental syncs.

1.3 Public Internet

If twtech environment has:

    • 1–100 Gbps internet
    • CDN offload
    • WAN acceleration
  • It becomes viable, but performance is variable.

2. Optimized Transfer Services

 2.1 AWS DataSync - Most popular for Terabytes (TB) – Petabytes (PB) transfers

Purpose-built for:

    • File systems S3, EFS, FSx
    • Agents use parallelism + compression + delta transfers
    • Handles 10 Gbps+ per agent
    • Deploy multiple agents for >100 Gbps total

DataSync advantages:

    • Checksums + integrity validation
    • Built-in retry, encryption
    • No protocol overhead like NFS/SCP/rsync
    • 10× faster than rsync

Architecture:

On-Prem File System  DataSync Agent  AWS PrivateLink  S3 / EFS / FSx / EC2 / Lambda

Best for:

    • PB-scale file migrations
    • Large dataset ingestion
    • Media, 
    • HPC, 
    • research, 
    • logs, backups

 2.2 S3 Transfer Acceleration (S3-TA)

  • Uses CloudFront’s global edge network to accelerate long-distance uploads.

Performance:

    •  50–500% faster for cross-continent transfers
    •  Close to AWS? twtech won't benefit.

Best for:

    • Uploading from globally distributed sources
    • Media ingestion
    • Web apps with global users

 2.3 S3 Multi-Part Upload

For large objects (>5 GB), use:

    • Multi-threading
    • Parallelism
    • Chunk sizes (64–256MB)
    • Resume support
  • Throughput can reach multi-Gbps with enough threads.

3. Offline Physical Transfer Devices (When Network Is Too Slow)

3.1 Snowcone

    • 8 TB usable storage
    • Small edge device
  USB-C powered: 
    • Used for remote, rugged, bandwidth-limited sites.

 3.2 Snowball Edge (Standard for PB transfers)

Two variants:

    • Snowball Edge Storage Optimized (~80 TB usable)
    • Snowball Edge Compute Optimized (~40 TB usable + compute)

Security:

    • AES-256 encryption
    • TPM
    • Tamper evident
    • Encrypted end-to-end

Typical ingestion workflow:

    1. AWS ships device
    2. twtech loads data locally (40–80 TB per device)
    3. twtech Ships data back to AWS
    4. AWS ingests to S3
    5. Verification completed

Scale:

    • 10 devices = 0.8 PB
    • 100 devices = 8 PB
    • Multi-day turnaround depending on shipping

🚚 3.3 AWS Snowmobile (100 PB per truck)

    • Industrial-scale data migration solution.

Best for:

    • 10 PB – 1 EB (massive archives, media libraries, seismic data)

Transfer rate:

    • ~1 Tbps aggregated internal write capacity

4. Hybrid Cloud Replication Services

4.1 AWS Storage Gateway

Used for:

    • File gateway S3-backed SMB/NFS
    • Volume gateway Backup/DR
    • Tape gateway Replace physical tape libraries
  • Not designed for one-time bulk loads, but good for gradual ingestion.

4.2 FSx Services

    • FSx for NetApp ONTAP
    • FSx for Windows
    • FSx for Lustre
  • Each provides native replication tools (SnapMirror, robocopy, HSM workflows).

4.3 Database Migration Services

Tools:

    • AWS DMS
    • Babelfish
    • Oracle RMAN to S3
    • PostgreSQL pg_dump / pg_basebackup
    • MySQL logical/physical dumps
    • DynamoDB import from S3

5. VM, Backup, and Large-Scale Migration Tools

5.1 VM Migration

    • AWS MGN (Application Migration Service)
    • AWS Server Migration Service (legacy)
    • VMware HCX AWS
  • HCX can transfer hundreds of VMs via WAN-optimized links.

5.2 Backup Tools

    • Veeam (Kasten-K10) AWS (S3, Glacier, VTL)
    • CommVault
    • Rubrik
    • NetBackup
  • These can hydrate backups directly into AWS.

6. Advanced Performance Architecture (End-to-End Flow into AWS):

7. Choosing the Right Method (Data Volume, Network Available Best Option)

Data Volume

Network Available

Best Option

< 5 TB

Good Internet

Direct upload / DataSync

550 TB

1–10 Gbps

DataSync / DX

50500 TB

<5 Gbps

Snowball Edge

500 TB 10 PB

DX < 10 Gbps

Snowball (multiple)

>10 PB

DX insufficient

Snowmobile

Continuous Replication

10+ Gbps DX

Direct Connect + DataSync

Global Upload

Distributed users

S3 Transfer Acceleration

8. Performance Optimization Techniques

TCP Tuning

    • TCP Window Size: 16–256 MB
    • Increase buffer sizes
    • Enable BDP-based tuning

Parallelism

    • 10 100 parallel upload streams for S3
    • Multi-threaded DataSync

Compression

    • Use at source if CPU available
    • DataSync compresses automatically

Chunking

    • S3: 64–256 MB multipart chunks

9. Security

    • In-flight encryption (TLS)
    • At-rest encryption with KMS keys
    • Snowball/Snowmobile (AES-256 XTS encryption)
    • IAM and S3 bucket policies for access control
    • VPC endpoints / PrivateLink

10. Cost Considerations

Method

Cost Type

DataSync

Per-GB ($0.0125/GB)

Snowball

Per device + shipping

Snowmobile

Contracted event-based

Direct Connect

Port-hour + data transfer

Transfer Acceleration

Premium per-GB

S3 Storage

Standard S3 pricing

 





No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...