Sunday, November 30, 2025

Transferring Large Amount of Data into AWS | Overview.

Transferring Large Amount of Data into AWS - Overview.

Scope:

  • Intro,
  • Online Network Transfer,
  • AWS DataSync Pros & Use Case,
  • AWS Transfer FamilyPros & Use Case,
  • Amazon S3 Transfer Acceleration Pros & Use Case,
  • AWS Command Line Interface (CLI) / SDKs Pros & Use Case,
  • AWS Direct Connect SDKs Pros & Use Case,
  • Offline Physical Transfer Devices (When Network Is Too Slow),
  • AWS Snowball Edge & Best Use Cases,
  • AWS Snowcone & Best Use Cases,
  • Hybrid Cloud Replication Services,
  • VM, Backup, and Large-Scale Migration Tools,
  • Advanced Performance Optimization Techniques (End-to-End Flow into AWS),
  • Choosing the Right Method (Data Volume, Network Available Best Option)
  • Security,
  • Cost Considerations.

Intro:

    • Transferring large amounts of data into AWS can be done through online and offline methods.
    • Transferring large amounts of data into AWS depends primarily on the:
      • Data volume, 
      • Available network bandwidth, 
      • Time constraints (How quick transfer is need).
Online Transfer Methods

NB:

    • These methods use twtech existing network connection to move data to AWS
 AWS DataSync, Pros & Use Case:

    • A managed file transfer service designed for automating and accelerating data movement between on-premises storage (NFS, SMB shares, Hadoop, etc.) and other AWS storage services (Amazon S3, EFS, FSx).
   Pros:
    • It Can be up to 10x faster than open-source tools; 
    • Handles many tasks automatically, 
    • Including data integrity validation and encryption.
   Best for:
    • One-time migrations, 
    • Recurring data processing, 
    • Automated replication when twtech has available network bandwidth.
 AWS Transfer Family Pros & Use Case:
    • Provides fully managed support for transferring files into and out of Amazon S3 using standard file transfer protocols (SFTP, FTPS, and FTP).
   Best for:
    • Seamlessly migrating existing file transfer workflows that rely on these protocols without changing client-side configurations.
 Amazon S3 Transfer Acceleration Pros & Use Case:
    • This feature optimizes public internet transfers to Amazon S3.
    • By leveraging Amazon CloudFront's global edge locations to minimize the effect of latency and maximize available bandwidth.
   Best for:
    • Accelerating uploads when using the public internet and dealing with high latency over long distances.
 AWS Command Line Interface (CLI) / SDKs Pros & Use Case:
    • For manual or scripted transfers, 
    • The AWS CLI provides s3 cp or s3 sync commands, 
    • Which can be tuned to use multipart uploads for large files.
Best for:
    • Users who require scripting capabilities and can manage network optimization manually.
AWS Direct Connect SDKs Pros & Use Case:
    • Establishes a dedicated private network connection from on-premises to AWS.
    • This dedicated private network connection provides a more consistent and higher-throughput experience than an internet connection.
Best for:
    • High-throughput, 
    • Reliable, 
    • Secure data transfer for ongoing hybrid cloud operations
Offline Transfer Methods (AWS Snow Family) 
    • when twtech has extremely large datasets (terabytes to petabytes), limited network bandwidth or no network bandwidth, or the data is not needed immediately, AWS provides physical storage devices. 
 AWS Snowball Edge & Best Use Cases:
    • A rugged, 
    • Secure device with significant storage (up to 210 TB NVMe SSD) 
    • With optional compute capabilities.
        Process:
    • twtech orders a device via the AWS Console, 
    • copy its data to it on-premises, 
    • And ship it back to AWS, 
    • where the data is loaded into twtech S3 bucket.
        Best for:
    • Bulk data migrations (terabytes to petabytes).
    • To make shipping of data faster, 
    • Its more cost-effective large data than online transfer.
 AWS Snowcone & Best Use Cases:
    • The smallest member of the Snow Family, 
    • offering up to 8 TB of usable storage per device.                                                                                                               
        Best for:
    •  Smaller data migration requirements 
    • or edge computing in remote locations.
AWS Snowmobile  & Best Use Cases:
    • A literal semi-trailer for exabyte-scale data migration.
        Best for:

    • Extremely large, 
    • multi-petabyte/exabyte-scale data center migrations.

There are the 5 primary categories:

    1. Online Network Transfer (Direct Connect, VPN, Internet)
    2. Optimized Transfer Services (AWS DataSync, S3 Transfer Acceleration)
    3. Physical Offline Devices (Snowcone, Snowball, Snowmobile)
    4. Hybrid Software Replication (Storage Gateway, FSx, RDS/Aurora tools)
    5.  Application-Level Migration (Databases, VMs, Files, Streams)

1. Online Network Transfer

1.1 Direct Connect (DX)

Best for: 

    • Predictable, 
    • High-bandwidth, 
    • Petabyte-scale continuous transfer.

Capacities:

    •  Dedicated DX: 1 Gbps, 10 Gbps, 100 Gbps
    •  Hosted DX: 50 Mbps – 10 Gbps

Use cases:

    • Datacenter AWS data ingestion
    • Real-time replication
    • Long-term hybrid architectures

Throughput:

    • With TCP tuning (window size, parallel streams), 
    • twtech can achieve:
      • ~70–90% of line-rate on optimized circuits.

Deep considerations:

    • Use Jumbo Frames (MTU 9001)
    • Tune TCP window size > 16 MB
    • Use parallel transfers for S3
    • Combine with AWS DataSync for protocol optimization

1.2 Site-to-Site VPN

  • Used when DX doesn't exist or for temporary migrations.
    • 1.25 Gbps per VPN tunnel (usually less, ~300–400 Mbps real-world)
    • Can use Equal-Cost Multi-Path (ECMP) to parallelize multiple tunnels

NB:

    • Not ideal for PB-scale, but workable for incremental syncs.

1.3 Public Internet

If twtech environment has:

    • 1–100 Gbps internet
    • CDN offload
    • WAN acceleration
  • It becomes viable, but performance is variable.

2. Optimized Transfer Services

 2.1 AWS DataSync - Most popular for Terabytes (TB) – Petabytes (PB) transfers

Purpose-built for:

    • File systems S3, EFS, FSx
    • Agents use parallelism + compression + delta transfers
    • Handles 10 Gbps+ per agent
    • Deploy multiple agents for >100 Gbps total

DataSync advantages:

    • Checksums + integrity validation
    • Built-in retry, encryption
    • No protocol overhead like NFS/SCP/rsync
    • 10× faster than rsync

Architecture:

On-Prem File System  DataSync Agent  AWS PrivateLink  S3 / EFS / FSx / EC2 / Lambda

Best for:

    • PB-scale file migrations
    • Large dataset ingestion
    • Media, 
    • HPC, 
    • research, 
    • logs, backups

 2.2 S3 Transfer Acceleration (S3-TA)

  • Uses CloudFront’s global edge network to accelerate long-distance uploads.

Performance:

    •  50–500% faster for cross-continent transfers
    •  Close to AWS? twtech won't benefit.

Best for:

    • Uploading from globally distributed sources
    • Media ingestion
    • Web apps with global users

 2.3 S3 Multi-Part Upload

For large objects (>5 GB), use:

    • Multi-threading
    • Parallelism
    • Chunk sizes (64–256MB)
    • Resume support
  • Throughput can reach multi-Gbps with enough threads.

3. Offline Physical Transfer Devices (When Network Is Too Slow)

3.1 Snowcone

    • 8 TB usable storage
    • Small edge device
  USB-C powered: 
    • Used for remote, rugged, bandwidth-limited sites.

 3.2 Snowball Edge (Standard for PB transfers)

Two variants:

    • Snowball Edge Storage Optimized (~80 TB usable)
    • Snowball Edge Compute Optimized (~40 TB usable + compute)

Security:

    • AES-256 encryption
    • TPM
    • Tamper evident
    • Encrypted end-to-end

Typical ingestion workflow:

    1. AWS ships device
    2. twtech loads data locally (40–80 TB per device)
    3. twtech Ships data back to AWS
    4. AWS ingests to S3
    5. Verification completed

Scale:

    • 10 devices = 0.8 PB
    • 100 devices = 8 PB
    • Multi-day turnaround depending on shipping

🚚 3.3 AWS Snowmobile (100 PB per truck)

    • Industrial-scale data migration solution.

Best for:

    • 10 PB – 1 EB (massive archives, media libraries, seismic data)

Transfer rate:

    • ~1 Tbps aggregated internal write capacity

4. Hybrid Cloud Replication Services

4.1 AWS Storage Gateway

Used for:

    • File gateway S3-backed SMB/NFS
    • Volume gateway Backup/DR
    • Tape gateway Replace physical tape libraries
  • Not designed for one-time bulk loads, but good for gradual ingestion.

4.2 FSx Services

    • FSx for NetApp ONTAP
    • FSx for Windows
    • FSx for Lustre
  • Each provides native replication tools (SnapMirror, robocopy, HSM workflows).

4.3 Database Migration Services

Tools:

    • AWS DMS
    • Babelfish
    • Oracle RMAN to S3
    • PostgreSQL pg_dump / pg_basebackup
    • MySQL logical/physical dumps
    • DynamoDB import from S3

5. VM, Backup, and Large-Scale Migration Tools

5.1 VM Migration

    • AWS MGN (Application Migration Service)
    • AWS Server Migration Service (legacy)
    • VMware HCX AWS
  • HCX can transfer hundreds of VMs via WAN-optimized links.

5.2 Backup Tools

    • Veeam (Kasten-K10) AWS (S3, Glacier, VTL)
    • CommVault
    • Rubrik
    • NetBackup
  • These can hydrate backups directly into AWS.

6. Advanced Performance Architecture (End-to-End Flow into AWS):

7. Choosing the Right Method (Data Volume, Network Available Best Option)

Data Volume

Network Available

Best Option

< 5 TB

Good Internet

Direct upload / DataSync

550 TB

1–10 Gbps

DataSync / DX

50500 TB

<5 Gbps

Snowball Edge

500 TB 10 PB

DX < 10 Gbps

Snowball (multiple)

>10 PB

DX insufficient

Snowmobile

Continuous Replication

10+ Gbps DX

Direct Connect + DataSync

Global Upload

Distributed users

S3 Transfer Acceleration

8. Performance Optimization Techniques

TCP Tuning

    • TCP Window Size: 16–256 MB
    • Increase buffer sizes
    • Enable BDP-based tuning

Parallelism

    • 10 100 parallel upload streams for S3
    • Multi-threaded DataSync

Compression

    • Use at source if CPU available
    • DataSync compresses automatically

Chunking

    • S3: 64–256 MB multipart chunks

9. Security

    • In-flight encryption (TLS)
    • At-rest encryption with KMS keys
    • Snowball/Snowmobile (AES-256 XTS encryption)
    • IAM and S3 bucket policies for access control
    • VPC endpoints / PrivateLink

10. Cost Considerations

Method

Cost Type

DataSync

Per-GB ($0.0125/GB)

Snowball

Per device + shipping

Snowmobile

Contracted event-based

Direct Connect

Port-hour + data transfer

Transfer Acceleration

Premium per-GB

S3 Storage

Standard S3 pricing

 





Saturday, November 29, 2025

VMware Cloud on AWS | Overview.

VMware Cloud on AWS - Overview.

Scope:

  • Intro,       
  • Underlying Architecture,
  • SDDC Architecture Overview,
  • Networking
  • Storage,
  • Operations,
  • Consumption Model,
  • Use Cases,
  • Best Practices & Design Considerations,
  • Architecture Diagram.

Intro:

    • VMware Cloud on AWS (VMC) is a jointly engineered service by VMware and AWS that runs:
      • vSphere, 
      • vSAN, 
      • NSX, 
      • HCX natively on bare-metal EC2 hosts inside AWS data centers
    • VMware Cloud on AWS (VMC) provides a fully managed Software-Defined Data Center (SDDC) with tight integration into native AWS services.

1. Underlying Architecture

1.1 Bare Metal Hosts

  • VMC uses dedicated, single-tenant bare-metal EC2 i3en and i4i instances, each including:
    • Dual Intel CPUs (e.g., i3en.metal: 96 vCPUs)
    • Hundreds of GiB RAM (e.g., 768 GiB)
    • NVMe SSD directly attached (for vSAN)
    • 25 Gbps ENA network connectivity

Each host runs:

    • vSphere ESXi
    • vSAN for distributed storage
    • NSX-T for networking & security

2. SDDC Architecture Overview

2.1 Management Domain

Contains:

    •  vCenter Server
    •  NSX-T Manager / Controllers
    •  HCX Manager (optional)
    •  ESXi management components
  • This domain is fully managed by VMware — you cannot SSH into these management VMs.

2.2 Compute Domain (Workload Clusters)

  • Everything twteh deploys (VMs, clusters, etc.) lives here.

Capabilities:

    • Scale clusters up to 16 hosts (standard)
    •  Aggregate multiple clusters per SDDC
    •  Use vSAN datastore across the cluster

VMware handles:

    •  ESXi patching/updates
    •  vCenter upgrades
    •  Hardware lifecycle

twtech manages:

    • VM workloads
    • vSphere configurations (resource pools, tags, DRS rules)
    • NSX-T networking for workload segments

3. Networking

3.1 Connectivity Options

    • AWS Direct Connect High throughput, low latency
    • IPsec VPN Quick setup, moderate latency
    • Transit Gateway (TGW) Multi-VPC spoke architecture
    • SDDC-to-SDDC NSX-T VPN

3.2 NSX-T Components

VMC uses NSX-T for:

    • Logical routing
    • Security groups / firewall policies
    • Distributed firewall (microsegmentation)
    • NAT & edge services

Router Tiering:

    • Tier-0 Gateway North–south traffic (AWS, on-prem, Internet)
    • Tier-1 Gateway Workload segments

3.3 vMotion to/from On-Prem

With HCX, you get:

    • Bulk migration
    • vMotion without downtime
    • Replication-assisted vMotion
    • L2 stretch across clouds
  • HCX abstracts latency but optimal latency is <150ms RTT for vMotion.

4. Storage

4.1 vSAN Storage

Each bare-metal host contributes:

    • NVMe SSD cache tier
    • NVMe SSD capacity tier

Configured as:

    • All-flash vSAN datastore
    • RAID-1 / RAID-5 / RAID-6 policies
    • Adaptive Resync & I/O balancing

Scaling:

    • Add host vSAN capacity automatically grows
    • Remove host storage is rebalanced

4.2 Supplemental Storage Options

1. Amazon FSx for NetApp ONTAP

    • NFS datastore for VMC
    • High capacity, 
    • lower cost
    • Snapshots, 
    • cloning, 
    • DR

2. Amazon EBS (limited use cases)

  • Used behind the scenes in some features, not for VM datastores.

5. Operations

5.1 Lifecycle Management (LCM)

VMware performs:

    • ESXi patching
    • vCenter upgrades
    • NSX upgrades
    • Firmware, 
    • BIOS, 
    • hardware lifecycle.
NB:
  • Service Level Agreement (SLA) often covers:
    • 99.9% uptime for SDDC
  • No customer downtime during upgrades.
  • VMware uses rolling upgrades.

5.2 Security

    • NSX-T distributed firewall
    • Identity-FW (AD/ID integration)
    • Microsegmentation
    • End-to-end encryption
    • AWS-managed security layers (VPC isolation, IAM integration)

5.3 Monitoring & Troubleshooting

Integrations:

    • vRealize Operations (vROps)
    • CloudWatch logs for SDDC events
    • SDDC Manager console (web UI)
    • VRLI Cloud (Log Insight Cloud)

6. Consumption Model

6.1 Host Types

Common host SKUs:

    • i4i.metal Latest-gen; best performance/price
    • i3en.metal High storage density

6.2 Pricing

    • On-Demand hourly
    •  1-year or 3-year Reserved
    •  Flexible subscription
    •  Elastic DRS auto-scaling

Savings:

    • 30–60% with reserved hosts
    • EDRS Auto-scale with load spikes

7. Use Cases

7.1 Data Center Extension

    • Lift-and-shift within days
    • Retain tools/processes
    • Connect to AWS-native services (RDS, S3, Lambda)

7.2 Disaster Recovery

    • VMware Cloud DR (VCDR)
    • Low RPO (minutes)
    • Rapid scale-out during failover

7.3 Application Modernization

Through AWS integrations:

    • S3 directly via ENI
    •  RDS for databases
    •  Lambda for operations
    •  API Gateway, DynamoDB, etc.
  • VMs continue running on vSphere but interface with AWS services.

8. Best Practices & Design Considerations

Networking

    • Always use ENI high-bandwidth connection for AWS service access
    • Set up TGW for multi-VPC connectivity
    • Plan CIDRs to avoid overlap

Storage

    • Use storage policies intelligently
    • Consider FSxN for capacity-heavy workloads

Governance

    • Tag resources (SDDC, clusters)
    •  Use AWS IAM with least-privilege principles
    •  Control EDRS limits to avoid surprise scaling costs

Security

    • Leverage Distributed Firewall for app segmentation
    • Use NSX Intelligence for traffic flows

9. Architecture Diagram 








Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...