Monday, November 3, 2025

AWS NAT Gateway with High Availability | Overview & Hands-On.

AWS NAT Gateway with High Availability - Overview & Hands-On.

Scope:

  • The concept of NAT Gateway,
  • The Core Architecture for 3-tier VPC,
  •  Key Components of NAT Gateway,
  •  How NAT Gateway Works (Under the Hood),
  •  High Availability (HA) Design (where it gets critical),
  •  Best Practice (Solution to AZ failure) 
  •  Routing Configuration For each private subnet route table,
  •  Cost Considerations for NAT Gateway & pricing Calculations as of 2025,
  •  Common Pitfalls, Description & Impacts,
  •  Alternatives for HA or Cost Efficiency (Description, Pro & Cons),
  •  Operational Best Practices (Deep Dive),
  •  Sample Terraform Snippet for Multi-AZ NAT Gateway,
  •  Checklist for HA NAT Gateway (Description Status).

 1. The concept of NAT Gateway

    • NAT (Network Address Translation) Gateway allows instances in a private subnet to access the internet for updates, downloads, etc, without exposing those resources to inbound internet traffic.

In other words:

    • Private instances can go out, but nothing can come in.”

 2. The Core Architecture for 3-tier VPC:

Key Components of NAT Gateway:

    • Internet Gateway (IGW)for outbound internet access.
    • NAT Gateway managed service for NAT.
    • Elastic IP (EIP)static public IP assigned to NAT Gateway.
    • Private Subnet no direct route to IGW, instead routes through NAT.

 3. How NAT Gateway Works (Under the Hood)

    1. A private EC2 instance initiates a connection to an internet address.
    2. The packet is routed (via route table) to the NAT Gateway.
    3. NAT Gateway replaces the source private IP with its own public Elastic IP.
    4. The packet goes out via the Internet Gateway.
    5. When the response returns, NAT Gateway maps it back to the private IP & forwards it internally (ssh forwarding).

 4. High Availability (HA) Design (where it gets critical).

 NAT Gateway Is AZ-Scoped

    • Each NAT Gateway lives within a single Availability Zone (AZ).
    • It is inherently redundant within its AZ, but not across AZs.

So if an AZ fails:

    • NAT Gateway in that AZ becomes unavailable.
    • Private subnets routing to that NAT Gateway lose internet connectivity.

 Best Practice (Solution to AZ failure) 

    • twtech needs to deploy One NAT Gateway per AZ

To achieve high availability:

    • Deploy 1 NAT Gateway per AZ.
    • Route private subnets in each AZ to the local NAT Gateway in the same AZ.

Sample HA Design for Multi-AZ Public Subnet, NAT Gateway, Private Subnet & Route Table

AZ

Public Subnet

NAT Gateway

Private Subnet

Route Target

us-east-2a

Public-1a

NATGW-1a

Private-1a

NATGW-1a

us-east-2b

Public-1b

NATGW-1b

Private-1b

NATGW-1b

us-east-2c

Public-1c

NATGW-1c

Private-1c

NATGW-1c

NB:

The above Design ensures:

    • No cross-AZ data path.
    • Localized routing.
    • Fault isolation (if one AZ or NAT fails, others still work).

 5. Routing Configuration For each private subnet route table:

    • Destination: 0.0.0.0/0 Target: nat-xxxxxxxx (NAT Gateway in same AZ)
    • For each public subnet route table
      • Destination: 0.0.0.0/0 Target: igw-xxxxxxxx (Internet Gateway) 

 6. Cost Considerations for NAT Gateway & pricing Calculations as of 2025,

    • Per hour: ~$0.045/hour (~$32.40/month per NAT)
    • Per GB processed: ~$0.045/GB (varies by region)

NB:

    • If twtech has 3 AZs 3 NAT Gateways ~$100/month ( ~$0.045/hour x 24hour x 30days)  + data charges.

 Optimization Tip:

NB:

    • If twtech workload in one AZ is light, it can route multiple private subnets through one NAT Gateway temporarily to reduce cost
      • But, route multiple private subnets through one NAT Gateway temporarily sacrifices HA.

 7. Common Pitfalls, Description & Impacts

Pitfall

Description

Impact

Single NAT Gateway

Used for all AZs

AZ dependency; cross-AZ traffic; higher cost and latency

Wrong Route Target

Routing to wrong NAT Gateway

Asymmetric routing; broken connections

No EIP association

NAT Gateway without Elastic IP

No internet access

No IGW in VPC

Missing Internet Gateway

NAT Gateway can’t reach the internet

Ephemeral scaling

Not monitoring throughput

NAT Gateway can be a bottleneck in high traffic scenarios

 8. Alternatives for HA or Cost Efficiency (Description, Pro & Cons)

Option

Description

Pros

Cons

NAT Instance

EC2 instance with iptables-based NAT

Customizable; cheaper

Manual scaling, patching, less HA

PrivateLink / VPC Endpoints

Direct AWS service access

No NAT cost for AWS services

Limited to AWS services only

Centralized Egress VPC (Shared Services)

Shared NAT Gateways via Transit Gateway

Centralized management

Cross-AZ data charges; complexity

 9. Operational Best Practices (Deep Dive)

   Deploy one NAT Gateway per AZ,
   Use separate route tables for each AZs private subnets,
   Monitor with CloudWatch (metrics: ActiveConnectionCount, BytesOutToDestination),
   Use VPC Flow Logs for troubleshooting,
   Enable VPC Reachability Analyzer for connectivity checks,
   Consider combining NAT Gateway with VPC Endpoints to minimize egress costs,

10. Sample Terraform Snippet for Multi-AZ NAT Gateway

# hcl
resource "aws_nat_gateway" "twtchnatgw" {
  for_each = var.azs
  allocation_id = aws_eip.nat_eip[each.key].id
  subnet_id     = aws_subnet.public[each.key].id
}
resource "aws_route" "private_default" {
  for_each = var.azs
  route_table_id         = aws_route_table.private[each.key].id
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id         = aws_nat_gateway.natgw[each.key].id
}

 11. Checklist for HA NAT Gateway (Description & Status)

Item

Description

Status

One NAT per AZ

Ensures redundancy

Local routing

Reduces cross-AZ latency/cost

Monitoring setup

CloudWatch, Flow Logs

Cost optimization

VPC Endpoints where possible

Tested failover

Verified per-AZ independence

 

Project: Hands-On

  • How twtech uses the NAT Gateway to connect to instances in the Private subnet of its Custom VPC in Prod (twtechvpc)

Search for AWS service: Nat Gateways

Step-1:

  • Create a NAT gateway:


  • Allocate an Elastic IP:


  • Create NAT gatewasy: twtechNATG


Step-2:

  • While the NATGW is being provisoned, twtech:
    •  Selects and Edit the private route table of the VPC (twtechPrivateRT) to Add route from internet to NATGW

  • From: local connection only

  • To: Add route

  • Save changes in Private route table of the VPC (twtechvpc)

NB:

  • It takes some time for NAT gateway to be fully provisioned.
    • twtech has to wait until it’s fully Available.

Step-3:

  • From EC2 console, Select the Bastion Host instance (Server in the Public subnet).  
    • Then connect (SSH) to the instance Provision in the private subnet.
      •  Next, Verify if internet is available.

Connect to instance

  • EC2 connect is used. 
    • However, twtech can use any terminal configured with aws CLI and Access keys to connect (SSH) into the Bastion Host.


Step-4:

  • From Bastion Host, 
  • twtech can does SSH-forwarding to connect to the instance provisioned in the Private subnet of twtech custom VPC.
    • with the following command and IPv4 address of the Instance in the Private subnet:

sudo ssh ec2-user@10.x.xx.65 -i twtechkey.pem

Step-5:

  • twtech needs to Verify whether EC2 instance in the Private subnet at this point has internet access from Nat Gateway:

ping think-with-tech.blogspot.com

Next:

Use the Curl command:

curl https://think-with-tech.blogspot.com

Sample test curl commands 

curl think-with-tech.blogspot.com

curl example.com

curl google.com

  • twtech Start running other commands in the private subnet instance to install packages or update Like:

sudo yum install net-tools -y

  • twtech can also start using the net-tools installed to verify all the ports listening in the private subnet instance.

sudo netstat -utln

twtech final takeaway:

    • Updating and patching of the instance in the Private subnet is done without access to the public internet.
    • Connection to Private instances is via the NAT Gateway created (twtecNATG)
    • For High Availability ( HA) twtech can create other NAT gateways in other Region for disaster recovery (DR)
    •  twtech needs to edit the route table in other regions to add the  gateways created in those regions.




No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...