Sunday, November 16, 2025

AWS Transit Gateway (TGW) & Site-to-Site VPN ECMP | Overview.


 AWS Transit Gateway (TGW) & Site-to-Site VPN Equal-Cost Multi-Path (ECMP).

Scope:

  • The concept of ECMP in AWS Transit Gateway (Used to increase the bandwith of connection),
  • TGW + VPN Architecture,
  • How TGW Performs Routing with ECMP,
  • On-Premises Router Requirements for ECMP,
  • TGW ECMP with Direct Connect Gateway (DXGW),
  • How TGW Fails Over with ECMP,
  • Throughput Expectations,
  • Limitations & Caveats,
  • Recommended Design Patterns,
  • Sample End-to-End Packet Flow,
  • Summary.

1. The concept of ECMP in AWS Transit Gateway (Used to increase the bandwith of connection)

  • Equal-Cost Multi-Path (ECMP) allows TGW to use multiple VPN tunnels simultaneously when they have:
    •  Equal route cost, 
    •  The same destination prefix.

NB:

    • With ECMP enabled, TGW load-balances traffic (flow-based hashing) across multiple active paths to the on-premises network.

Why It Matters

    • Higher aggregate throughput
    • Better resiliency
    • Active-Active multi-tunnel utilization
    • More deterministic routing
    • Potentially lower latency (different paths can be used)

 2. TGW + VPN Architecture

Standard AWS Site-to-Site VPN:

Each VPN connection = 2 tunnels

    • Usually one active, one standby (non-ECMP)
    • Failover only, NOT load balancing

When ECMP Is Enabled:

    • Both tunnels are active-active
    • Multiple VPN connections can also be active-active Up to 200 ECMP paths per TGW

AWS allows:

1 VPN = 2 tunnels ECMP active-active
N VPNs = 2N tunnels  all can be ECMP


3. How TGW Performs Routing with ECMP

TGW Routing Logic:

TGW uses flow hashing (5-tuple hashing) across all equal-cost paths:

    • Source IP
    • Destination IP
    • Source Port
    • Destination Port
    • Protocol

NB:

  • This ensures per-flow consistency, preventing packet reordering.

TGW Route Table Entry Example:

Prefix: 10.0.0.0/16
Targets:
  • VPN-Conn-1 (Tunnel 1, Tunnel 2)
  • VPN-Conn-2 (Tunnel 1, Tunnel 2)
ECMP: Enabled

NB:

    • TGW will distribute flows over all 4 tunnels equally.

 4. On-Premises Router Requirements for ECMP

    • To fully use TGW ECMP, the on-prem router must support:

1. BGP multipath

    • Allow multiple BGP routes with same AS-Path
    • Must NOT suppress equal routes

2. Policy-Based Routing NOT required

    • TGW handles hashing, not the on-prem router.

3. Symmetric Routing support

    • Traffic MUST return through the same tunnel (stateful firewalls require this).

Recommended configurations:

    • Cisco: maximum-paths allowed
    • Juniper: multipath multiple-as
    • Palo Alto: enable ECMP in virtual router
    • Fortinet: config router bgp set ebgp-multipath enable

 5. TGW ECMP with Direct Connect Gateway (DXGW)

TGW ECMP works across DX and VPN simultaneously but:

    • DX uses static or BGP routes, but has different path priority
    • By default, DX is preferred over VPN
    • For true ECMP between VPN/DX, the AS-Path must be altered

Typical strategies:

    • AS-Path prepending on DX
    • Advertise same routes with equal attributes

 6. How TGW Fails Over with ECMP

    • Tunnel Down  Immediate Hash Adjustment

TGW automatically:

    • Detects BGP down
    • Removes tunnel from the hash
    • Redirects flows to remaining tunnels (new flows only)

Stateful systems:

    • Existing flows on the failed tunnel will drop—this is expected with ECMP.

 7. Throughput Expectations

Without ECMP

    • One tunnel = up to ~1.25 Gbps
    • Active-Standby only 1 used

With ECMP

All tunnels active
Sample:

    •  2 VPN connections × 2 tunnels each = 4 active tunnels
    •  Aggregate throughput up to ~5 Gbps theoretical

NB:

    • This Actually depends on router CPU, packet size, encryption overhead.

 8. Limitations & Caveats

1. Must enable ECMP on the TGW

    • EnableEqualCostMultipathRouting: Yes

2. Per-Flow Load Balancing

    • TGW does NOT split a single flow across tunnels.

3. NAT Issues

    • On-prem NAT devices may break stateful return traffic in ECMP scenarios.

4. AWS Managed VPN limits

    • 1.25 Gbps per tunnel soft limit
    • 4,000 BGP routes max

5. IPSec overhead

    • CPU-bound on both AWS and on-prem.

 9. Recommended Design Patterns

Pattern 1 High-Throughput Multi-VPN ECMP

Used for large AWSOn-Prem traffic:

    • Create 2–4 VPN connections to the same customer gateway device
    • Enable ECMP on TGW
    • Allow multipath in on-prem router

Result: 4–8 active paths

Pattern 2 – DX + VPN High Availability with ECMP

    • DXGW + VPN termination at TGW
    • Tune AS-Paths for route equality
    • Hybrid ECMP between DX and VPN (rare but possible)

Pattern 3 – Multi-Region ECMP

    • Deploy TGWs in multiple regions
    • Use Transit Gateway Peering
    • Use ECMP within a region; peering is NOT ECMP-enabled (no multipath)

 10. Sample End-to-End Packet Flow

1.     VPC TGW matches on-prem CIDR

2.     TGW has 4 equal-cost VPN tunnels

3.     Hashing picks Tunnel 3

4.     Traffic flows IPSec to on-prem

5.     If Tunnel 3 fails:

      •    New flows are hashed to T1/T2/T4
      •    Existing Tunnel 3 flows drop

 Summary

AWS TGW + VPN ECMP unlocks (Benefits)

High throughput
Active-Active tunnels
Better resiliency & failover
Multipath flexibility with DX & VPN

But requires:

    • TGW ECMP enabled
    • On-prem BGP multipath
    • Symmetric routing
    • Proper AS-Path management when mixing VPN & DX





No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...