Monday, November 17, 2025

IPv4 Troubleshooting | Overview.


An Overview of IPv4 Troubleshooting.

Scope:

  •        Cloud-architect level,
  •     Production-focused

Breakdown:

  •        IPv4 stack problems,
  •        Layer 1–3: Addressing & Subnetting,
  •        Layer 3: Routing Troubleshooting,
  •        NAT Troubleshooting,
  •        DNS Troubleshooting (Often confused with IP issues),
  •        Transport Layer (TCP/UDP),
  •        Firewalls, ACLs, Security Groups, NACLs,
  •        MTU, Fragmentation, PMTUD,
  •        Asymmetric Routing (One of the hardest issues)
  •        Packet Capture Workflow,
  •        Systematic IPv4 Troubleshooting Flow,
  •        AWS-Specific IPv4 Failure Patterns,

IPv4 stack problems ( fall into one of five domains):

  1.      Addressing / Subnetting
  2.      Routing
  3.      NAT
  4.      DNS
  5.      Transport Layer (TCP/UDP)
  6.      Firewalling (SG/NACL/ACL/iptables/etc.)
  7.      Application-Layer quirks

NB:

  • This Overview provides a tiered troubleshooting framework with commands, packet flow logic, and AWS-specific pitfalls (if twtech is troubleshooting inside VPCs or hybrid networks).

1. Layer 1–3: Addressing & Subnetting

Incorrect IPv4 addressing is the #1 cause of failures.

1.1 Checklist

  •         Does the host have a valid IPv4 address?
  •         Is the netmask correct?
  •         Is the default gateway in the same subnet?
  •         Any duplicate IPs?
  •         Any ARP poisoning or stale ARP caches?

1.2 Management Commands

Linux:

# bash
ip addr
ip route
ip neigh
arp -a

Windows:

# bash
ipconfig /all
route print
arp -a

Common pitfalls

  •         Host mask mismatch (e.g., host thinks /24 but network is /23).
  •         Gateway configured outside subnet host silently drops frames.
  •         ARP cache stale; clearing fixes many “weird” issues:
  •     ip neigh flush all

2. Layer 3: Routing Troubleshooting

2.1 Understand the routing decision

Routing is done in this order:

1.     Longest Prefix Match (LPM)

2.     Administrative Distance (static vs BGP vs OSPF, etc.)

3.     Metric / cost

2.2 Routing checks

# bash
ip route get <destination>
tracepath <destination>
traceroute <destination>
mtr <destination>

AWS specifics:

  •         Route tables must include correct local, IGW, NATGW, TGW, DX, or VPC peering routes.
  •         Blackhole route entries occur when EC2 ENI deleted or peering removed.
  •         Subnet associations matter; make sure correct RT is applied.

3. NAT Troubleshooting

NAT = BIG source of IPv4 issues.

3.1 SNAT vs DNAT

  •         SNATprivate public (outbound)
  •         DNATpublic private (inbound)

3.2 Logs / checks

Linux iptables NAT table:

sudo iptables -t nat -L -n -v

AWS NAT Gateway:

  •         Check CloudWatch metrics:
      •    ErrorPortAllocation
      •    PacketsDropped
      •    BytesOut
  •         NAT Gateway fails when:
      •    No route to destination
      •    No IGW in the VPC
      •   SNAT port exhaustion (rare but real with high concurrency)

3.3 Double NAT

Occurs commonly in:

  •         On-prem firewall NAT AWS NATGW internet

Symptoms:

  •         Broken return traffic
  •         Inconsistent path MTU
  •         Services failing only inbound or outbound

4. DNS Troubleshooting (Often confused with IP issues)

Most connectivity failures are DNS masquerading as networking issues.

4.1 Checklist

  •         Can the resolver be reached?
  •         Is the DNS server configured correctly?
  •         AAAA vs A confusion?
  •         Split-horizon inconsistencies?

4.2 Tools

# bash
dig <hostname>
dig +trace <hostname>
dig @<dns-server> <hostname>
nslookup <hostname>

AWS specifics:

  •         EC2 uses VPC Resolver (AmazonProvidedDNS) at:
    •    169.254.169.253
  •         Conditional forwarders for hybrid setups often misconfigured.
  •         Route 53 Resolver rules must be associated with the correct VPC.

5. Transport Layer (TCP/UDP)

Symptoms

  •         SYN sent but no SYN-ACK blocked or blackholed
  •         SYN-ACK received but ACK missing asymmetric routing
  •         UDP “works sometimes” random firewall drops or NAT timeouts

5.1 Tools

# bash
tcpdump -n port <port>
ss -tnlp
nc -zv <host> <port>

Example TCP handshake capture:

tcpdump -nn -i eth0 "tcp[tcpflags] & (tcp-syn|tcp-ack) != 0"

AWS specifics:

  •         TGW asymmetric routing is a classic problem.
  •         NLB preserves client IP (can break firewalls).
  •         ALB does NOT preserve client IP; check X-Forwarded-For.

6. Firewalls, ACLs, Security Groups, NACLs

6.1 Host Firewall

# bash
sudo iptables -L -n -v
sudo ufw status
sudo firewalld-cmd --list-all

6.2 AWS Security Groups

  •         Stateful
  •         Return traffic automatically allowed
  •         If outbound rules misconfigured outbound fails silently

6.3 NACLs

  •         Stateless
  •         Need both inbound + outbound rules
  •         Common issues:
    •    Ephemeral ports not allowed
    •    Implicit deny blocks traffic

6.4 Middlebox issues

  •         IDS/IPS dropping packets
  •         DPI throttling or fragmentation issues
  •         VPN/firewall tunnels dropping large packets

7. MTU, Fragmentation, PMTUD

Highly underrated cause of IPv4 issues.

Symptoms

  •         HTTPS works but HTTP breaks
  •         Some sites load, some don’t
  •         DNS works but large downloads fail
  •         TCP stalls mid-transfer

Quick test

# bash
ping -M do -s 1472 8.8.8.8

If it fails:

Reduce until success… MTU_blackhole_detected

AWS MTU specifics:

  •         VPC ENIs: 9001 bytes
  •         VPN over internet: 1420 / 1399
  •         DX: 1500 / 1522 (depending on encapsulation)

PMTUD breaks if ICMP type 3 code 4 blocked by firewalls.

8. Asymmetric Routing (One of the hardest issues)

Asymmetry leads to:

  •         SYN goes one way, SYN-ACK goes another
  •         Packets accepted but return traffic dropped
  •         Firewalls drop sessions because state is on the wrong boundary

AWS contexts where asymmetry is common:

  •         TGW + on-prem with multiple DX links
  •         Multi-AZ firewalls in HA pairs
  •         VPC peering + TGW overlapping paths
  •         Load balancer preservation of client IP

Tools:

# bash
mtr -6
tracepath
tcpdump on both sides simultaneously

9. Packet Capture Workflow

9.1 The “two-sided capture” rule

To diagnose anything non-trivial:

  •         Capture on the source
  •         Capture on the destination
  •         Compare flows

9.2 Tools

Linux:

tcpdump -i eth0 -w capture.pcap

Windows:

  •         Wireshark
  •         NetMon

AWS:

  •         VPC Traffic Mirroring to Suricata, Zeek
  •         GWLB insertion for deep packet inspection

10. Systematic IPv4 Troubleshooting Flow

Step 1: Local host checks

  •         IP correct?
  •         Gateway correct?
  •         ARP table sane?

Step 2: Can the host reach the gateway?

# bash
ping <gateway>

Step 3: Routing table sanity

# bash
ip route get <destination>

Step 4: DNS resolution confirmed?

Step 5: Is NAT/SNAT working?

  • Check NAT allocations / flows.

Step 6: Firewall sanity

  • SG, NACL, on-prem firewalls.

Step 7: Use tcpdump on both ends

  • Find where the packet dies.

Step 8: MTU / PMTUD

Step 9: Asymmetry / hybrid path issues

11. AWS-Specific IPv4 Failure Patterns

Pattern A“I can SSH out but not in”

Cause:

  •         No public IPv4
  •         SG inbound blocked
  •         NACL inbound blocked
  •         Route table missing 0.0.0.0/0 IGW

Pattern B“EC2 can’t reach internet”

Cause:

  •         Using private subnet with no NATGW route
  •         No IGW attached
  •         Misconfigured DNS resolver
  •         SG outbound blocked

Pattern COn-prem AWS via DX works, but AWS on-prem fails

Cause:

  •         Asymmetric routing through VPN fallback
  •         BGP prefix advertisement mismatch
  •         On-prem firewall drops AWS source ranges

Pattern D VPC TGW DX colocation firewalls

Cause:

  •         Stateful devices drop return path
  •         MTU mismatch on GRE/IPSec tunnels
  •         Missing reverse route propagation

No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...