An Overview of AWS Transit
Gateway (TGW) throughput with Equal-Cost Multi-Path (ECMP)
Scope:
- Architecture,
- Per-attachment limits,
- Scaling models,
- VPN/Direct Connect behavior,
- Flow hashing,
- Bottlenecks,
- Best-practice designs for high-throughput architectures.
Intro:
- AWS Transit Gateway (TGW) is a distributed, horizontally-scaled router that spans an AWS Region.
- AWS Transit Gateway (TGW) throughput characteristics vary depending on the type of attachment (VPC, VPN, Direct Connect, Peering),
- Equal-Cost
Multi-Path (ECMP) radically
affects how twtech design for aggregate bandwidth.
Breakdown:
- TGW Architecture =
Distributed Router,
- Baseline TGW Throughput Limits (Per Attachment),
- ECMP (Equal-Cost Multipath)
on AWS TGW,
- How ECMP Traffic Distribution
Works in TGW,
- Throughput Scaling With ECMP,
- TGW Is Flow-Limited:
Understanding Per-Flow Limits,
- Common Bottlenecks in TGW ECMP Designs,
- Best-Practice Designs for High-Throughput TGW ECMP.
1. TGW Architecture = Distributed Router
TGW is not a single
device. Internally:
- TGW uses a distributed data plane of “route processing units.”
- Each attachment (VPC, VPN, DX, Peering) terminates on a set of distributed nodes.
- TGW can scale traffic horizontally as long as flows are distributed across nodes.
This matters because TGW throughput is attachment-level, not
gateway-level.
2. Baseline TGW Throughput Limits (Per Attachment)
VPC Attachment
- Up to 50 Gbps burst per VPC attachment
- Distributed across multiple ENIs
- Single TCP flow limited by ENI processing path (~5 Gbps typical)
Transit Gateway Peering Attachment
- Up to 50 Gbps per peering attachment
- No ECMP on TGW peering.
Direct Connect Transit Virtual
Interface (Transit VIF)
- DXGW → TGW traffic:
- Single Transit VIF: up to 5–50 Gbps depending on physical DX
- TGW supports ECMP with DX (multiple Transit VIFs)
Site-to-Site VPN
- Each S2S VPN tunnel:
- 1.25 Gbps encrypted throughput ceiling
- In practice ~1 Gbps max due to IPsec overhead
- TGW supports ECMP across multiple VPN tunnels.
3. ECMP (Equal-Cost Multipath) on AWS TGW
TGW supports
ECMP for:
|
Attachment Type |
ECMP Supported? |
Notes |
|
VPN (IPSec) |
✅ Yes |
Up to 8 tunnels
active/ECMP |
|
Direct Connect (via
DXGW) |
✅ Yes |
Multiple Transit
VIFs |
|
VPC Attachments |
❌ No |
TGW load-balances
flows internally, not ECMP paths |
|
TGW Peering |
❌ No |
Single flow per
attachment |
For ECMP to
be enabled:
- There must be multiple equal-cost routes in the TGW route table.
- Typically built using multiple Customer Gateway (CGW) IPs, multiple tunnels, or multiple Transit VIFs.
4. How ECMP Traffic Distribution Works in TGW
TGW uses a 5-tuple flow hashing
src-ip, dst-ip, src-port, dst-port, protocol
Implications:
- Large single flows (e.g., single TCP stream) do not get split across paths → stick to a single VPN tunnel / VIF.
- Multiple smaller flows scale horizontally across multiple ECMP paths.
- UDP traffic spreads more naturally (due to more randomized ports).
- Multipath throughput is aggregate, not per-flow.
5. Throughput Scaling With ECMP
Below are
practical scaling behaviors:
5.1 ECMP With VPN Tunnels
Example: 4 tunnels using ECMP
|
Tunnels |
Total Max Aggregate
Throughput |
|
2 tunnels |
~2 Gbps |
|
4 tunnels |
~4–5 Gbps |
|
8 tunnels |
~7–8 Gbps |
NB:
- Max IPSec per tunnel: ~1 Gbps
- Actual throughput depends heavily on CGW device limits (physical or virtual).
5.2 ECMP With Direct Connect Transit
VIFs
If twtech
creates:
- 2 × 10-Gbps Transit VIFs
- Same BGP metrics (AS-PATH / MED)
- Same prefix advertisements
TGW will do ECMP across both VIFs → ~20 Gbps aggregate.
This is the highest-throughput TGW design besides AWS backbone-based VPC or
inter-region traffic.
6. TGW Is Flow-Limited: Understanding Per-Flow
Limits
The throughput twtech gets depends on the number of flows
and the per-flow bandwidth capability.
Per-flow throughput depends on:
- VPC ENI path (5 Gbps typical per-flow)
- VPN tunnel encryption limits (1 Gbps)
- DX NIC speed
- Remote CGW or on-prem firewall performance
If twtech pushes a single TCP flow,
it rarely exceed:
- VPN: ~1 Gbps
- DX: 5–10 Gbps depending on NIC offload
- VPC-to-VPC: 5 Gbps typical single-flow
To fully utilize TGW, twtech needs parallel flows.
7. Common Bottlenecks in TGW ECMP Designs
|
Bottleneck |
Symptoms |
|
On-prem firewall maxes out |
VPN tunnels flapping, CPU 90% |
|
Using only 1 VPN tunnel |
1 Gbps ceiling |
|
Only 1 Transit VIF |
5–10 Gbps ceiling |
|
Application uses single TCP flow |
twtech never reach ECMP potential |
|
TGW route table not configured for ECMP |
Only 1 path used |
|
CGW devices not supporting multiple BGP sessions |
No ECMP |
8. Best-Practice Designs for High-Throughput TGW ECMP
A. High-throughput VPN → TGW (5–8
Gbps)
Use:
- 4–8 IPSec tunnels
- Each tunnel on a unique CGW IP
- Ensure:
- Dead-peer detection fast rekeying enabled
- Equal BGP metrics
- ECMP enabled on twtech on-prem router
B. High-throughput Direct Connect → TGW (20–100
Gbps)
Use multiple Transit VIFs:
Example high-bandwidth design:
- 4 × 10 Gbps DX connections
- 4 Transit VIFs
- ECMP enabled
- Aggregate = 40 Gbps+
Customers with 100-Gbps DX fiber can reach 100 Gbps+ using
multiple 10/100G connections.
C. VPC to on-prem Throughput-Maximizing
Pattern
Even though TGW VPC attachments don’t
use ECMP:
twtech
can still scale by:
- Multiple AZs (each adds bandwidth)
- ENA Express (v1 & v2)
- Horizontal scaling (multiple app instances)
- Multi-flow parallelism at the application layer
9. Putting It All Together – Realistic Throughput
Expectations
VPN
- Typical: 3–6 Gbps with 4–6 tunnels
- Max theoretical: 8–10 Gbps with 8 tunnels
Direct Connect
- Typical ECMP bundle: 20–40 Gbps
- Max theoretical: 100+ Gbps (multiple 100G links)
VPC Attachments
- 50 Gbps burst per VPC attachment
- But single-flow still ~5 Gbps
No comments:
Post a Comment