Monday, November 3, 2025

NAT Gateway with High Availability | Overview & Hands-On.


Intro:

A deep dive into NAT Gateway with High Availability in AWS.

Focus:

  •        Concepts,
  •        Architecture,
  •        Deployment strategies,
  •        Cost considerations,
  •        Pitfalls,
  •        High Availability (HA) best practices.

Breakdown:

  •        The concept: NAT Gateway,
  •        The Core Architecture,
  •        How NAT Gateway Works (Under the Hood),
  •        High Availability (HA) Design,
  •        Routing Configuration,
  •        Cost Considerations,
  •        Common Pitfalls,
  •        Alternatives for HA or Cost Efficiency,
  •        Operational Best Practices,
  •        Example Terraform Snippet (Multi-AZ NAT Gateway),
  •        Summary – HA NAT Gateway Checklist.

 1. The concept: NAT Gateway

  • A NAT (Network Address Translation) Gateway allows instances in a private subnet to access the internet (for updates, downloads, etc.) without exposing them to inbound internet traffic.

In other words:

  • Private instances can go out, but nothing can come in.”

 2. The Core Architecture

Let’s visualize a standard 3-tier VPC:


Key Components:

  •         Internet Gateway (IGW)for outbound internet access.
  •         NAT Gateway managed service for NAT.
  •         Elastic IP (EIP)static public IP assigned to NAT Gateway.
  •         Private Subnet no direct route to IGW, instead routes through NAT.

 3. How NAT Gateway Works (Under the Hood)

  1.      A private EC2 instance initiates a connection to an internet address.
  2.      The packet is routed (via route table) to the NAT Gateway.
  3.      NAT Gateway replaces the source private IP with its own public Elastic IP.
  4.      The packet goes out via the Internet Gateway.
  5.      When the response returns, NAT Gateway maps it back to the private IP and forwards it internally.

 4. High Availability (HA) Design

This is where it gets critical.

 NAT Gateway Is AZ-Scoped

·       Each NAT Gateway lives within a single Availability Zone (AZ).
It is inherently redundant within its AZ, but not across AZs.

So if an AZ fails:

    •         NAT Gateway in that AZ becomes unavailable.
    •         Private subnets routing to that NAT Gateway lose internet connectivity.

 Best Practice: One NAT Gateway per AZ

To achieve high availability:

  •         Deploy 1 NAT Gateway per AZ.
  •         Route private subnets in each AZ to the local NAT Gateway in the same AZ.

Sample HA Design (Multi-AZ)

AZ

Public Subnet

NAT Gateway

Private Subnet

Route Target

us-east-2a

Public-1a

NATGW-1a

Private-1a

NATGW-1a

us-east-2b

Public-1b

NATGW-1b

Private-1b

NATGW-1b

us-east-2c

Public-1c

NATGW-1c

Private-1c

NATGW-1c

This ensures:

  •         No cross-AZ data path.
  •         Localized routing.
  •         Fault isolation (if one AZ or NAT fails, others still work).

 5. Routing Configuration

For each private subnet route table:

  • Destination: 0.0.0.0/0 → Target: nat-xxxxxxxx (NAT Gateway in same AZ)
  • For each public subnet route table:
  • Destination: 0.0.0.0/0 Target: igw-xxxxxxxx (Internet Gateway) 

 6. Cost Considerations

NAT Gateway pricing (as of 2025):

  •         Per hour: ~$0.045/hour (~$32.40/month per NAT)
  •         Per GB processed: ~$0.045/GB (varies by region)

NB:

  •        If twtech has 3 AZs → 3 NAT Gateways → ~$100/month + data charges.

 Optimization Tip:

NB:

  •        If twtech workload in one AZ is light, it can route multiple private subnets through one NAT Gateway temporarily to reduce cost—but this sacrifices HA.

 7. Common Pitfalls

Pitfall

Description

Impact

Single NAT Gateway

Used for all AZs

AZ dependency; cross-AZ traffic; higher cost and latency

Wrong Route Target

Routing to wrong NAT Gateway

Asymmetric routing; broken connections

No EIP association

NAT Gateway without Elastic IP

No internet access

No IGW in VPC

Missing Internet Gateway

NAT Gateway can’t reach the internet

Ephemeral scaling

Not monitoring throughput

NAT Gateway can be a bottleneck in high traffic scenarios

 8. Alternatives for HA or Cost Efficiency

Option

Description

Pros

Cons

NAT Instance

EC2 instance with iptables-based NAT

Customizable; cheaper

Manual scaling, patching, less HA

PrivateLink / VPC Endpoints

Direct AWS service access

No NAT cost for AWS services

Limited to AWS services only

Centralized Egress VPC (Shared Services)

Shared NAT Gateways via Transit Gateway

Centralized management

Cross-AZ data charges; complexity

 9. Operational Best Practices

   Deploy one NAT Gateway per AZ
   Use separate route tables for each AZs private subnets
   Monitor with CloudWatch (metrics: ActiveConnectionCount, BytesOutToDestination)
   Use VPC Flow Logs for troubleshooting
   Enable VPC Reachability Analyzer for connectivity checks
   Consider combining NAT Gateway with VPC Endpoints to minimize egress costs

10. Example Terraform Snippet (Multi-AZ NAT Gateway)

# hcl
resource "aws_nat_gateway" "twtchnatgw" {
  for_each = var.azs
  allocation_id = aws_eip.nat_eip[each.key].id
  subnet_id     = aws_subnet.public[each.key].id
}
resource "aws_route" "private_default" {
  for_each = var.azs
  route_table_id         = aws_route_table.private[each.key].id
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id         = aws_nat_gateway.natgw[each.key].id
}

 11. Summary – HA NAT Gateway Checklist

Item

Description

Status

One NAT per AZ

Ensures redundancy

Local routing

Reduces cross-AZ latency/cost

Monitoring setup

CloudWatch, Flow Logs

Cost optimization

VPC Endpoints where possible

Tested failover

Verified per-AZ independence

 

Project: Hands-On

How twtech uses the NAT Gateway to connect to instances in the Private subnet of the VPC (twtechvpc)

Search for AWS service: Nat Gateways

Step-1:

Create a NAT gateway:


Allocate an Elastic IP:


Create NAT gatewasy: twtechNATG


Step-2:

While the NATGW is being provisoned, Select and Edit the private route table of the VPC (twtechPrivateRT) to: Add route from internet to NATGW

From: local connection only

To: Add route

Save changes in Private route table of the VPC (twtechvpc)

NB:

It takes some time for NAT gateway to be fully provisioned: wait until it’s Available.

Step-3:

From EC2 console, Select the Bastion Host instance,  then connect (SSH) to the instance in to the private subnet: to Verify if internet is available.

Connect to instance: EC2 connect is used. However, any terminal configured can be used to connect (SSH) into the Bastion Host


Step-4:

From Bastion Host, twtech can do SSH-forwarding: to connect to the instance in Private subnet using with its IPv4 address:

sudo ssh ec2-user@10.x.xx.65 -i twtechkey.pem

Step-5:

Verify whether EC2 instance in the Private subnet now has internet access from Nat Gateway:

ping think-with-tech.blogspot.com

Next:

Use the Curl command:

curl https://think-with-tech.blogspot.com

curl think-with-tech.blogspot.com

curl example.com

curl google.com

twtech now run other commands in the private subnet instance: Like

sudo yum install net-tools -y

twtech can now Use the net-tools installed: to verify all the ports listening in the private subnet instance.

sudo netstat -utln

NB:

  •        Updating and patching of the instance in the Private subnet is done without access to the public internet.
  •        Connection to Private instances is via the NAT Gateway created (twtecNATG)
  •        For High Availability ( HA) twtech can create other NAT gateways in other Region for disaster recovery (DR),  then edit the route table to add the  gateways created in those regions to the routes as well.

No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...