Friday, November 7, 2025

Troubleshoot Security Group (SG) & Network Access Control List (NACL) issues with Amazon VPC Flow Logs | Deep Dive.

A dive deep into troubleshooting Security Group (SG) and Network ACL (NACL) issues using Amazon VPC Flow Logs.

Focus:

  •    Step-by-step, from packet flow analysis to interpreting log entries for root-cause detection.

Breakdown:

  •        The Context: Where VPC Flow Logs Fit in,
  •        VPC Packet Flow Review (Order of Evaluation),
  •        What VPC Flow Logs Show,
  •        Interpreting ACCEPT vs REJECT,
  •        Step-by-Step Troubleshooting Flow,
  •        Real Sample,
  •        Key Diagnostic Patterns,
  •        Tools to Automate Analysis,
  •        Visualization: SG/NACL Decision Flow.

 1. The Context Where VPC Flow Logs Fit in

  •        When traffic fails (connectivity drops, timeouts, or access denials), it’s often due to SG or NACL rules.
  •        VPC Flow Logs give network-level visibility of traffic allowed or denied — critical for diagnosing:

  •         SG misconfigurations (wrong CIDR, port, or direction)
  •         NACL rule conflicts
  •         Missing routes or private endpoint setups

 2. VPC Packet Flow Review (Order of Evaluation)

  • Before looking at logs, recall the order in which AWS evaluates network access:

Step

Component

Direction

Key Rule Type

Stateful?

1

Security Group

Inbound / Outbound

Allow only

Yes

2

Network ACL

Inbound / Outbound

Allow or Deny

No

3

Route Table

Outbound only

Routes traffic

N/A

4

VPC Flow Logs

Both directions

Capture result (ACCEPT/REJECT)

N/A

NB:

  •        Security Groups first (instance-level),then NACLs (subnet-level).
  •        If either denies traffic, twtech will see a REJECT in Flow Logs.

 3. What VPC Flow Logs Show

A sample flow log line:

2 123456789012 eni-abc123def456 10.0.1.10 172.31.5.20 52000 443 6 3 1800 1609459200 1609459260 REJECT OK

Field

Meaning

10.0.1.10

Source IP

172.31.5.20

Destination IP

52000

Source Port

443

Destination Port

6

Protocol (6 = TCP)

REJECT

Denied by SG or NACL

 4. Interpreting ACCEPT vs REJECT

Log Action

Meaning

Common Cause

ACCEPT

Packet allowed through both SG + NACL

Correct rules

REJECT

Packet denied at SG or NACL

Misconfigured rule or direction

NB:

  •        Flow Logs don’t directly say which layer (SG/NACL) caused the denial — twtech infers it using packet direction and rules.

 5. Step-by-Step Troubleshooting Flow

 Step 1: Identify the Traffic Pattern

  •         Source IP, destination IP, and port.
  •         Direction: inbound (response to external request) or outbound (initiated from instance).

 Step 2: Check the Flow Log Entry

  •         Look for entries with action=REJECT for those IPs/ports.
  •         Example query (CloudWatch Logs Insights):

# bash
fields @timestamp, interfaceId, srcAddr, dstAddr, srcPort, dstPort, action
| filter action="REJECT"
| sort @timestamp desc

 Step 3: Determine Direction

  •         If source is your instance IP, and destination is outside → Outbound.
  •         If destination is your instance IP → Inbound.

 Step 4: Check Security Group Rules

SGs are stateful:

  • If outbound is allowed, return inbound traffic is automatically allowed.

Scenario

Check

Outbound to Internet blocked

Verify egress SG rule (0.0.0.0/0, TCP 443/80)

Inbound connection fails

Verify ingress SG allows source IP/CIDR and port

Peer EC2 access fails

Both SGs must reference each other or open required ports

Sample Fix:

# bash 
aws ec2 authorize-security-group-ingress \
  --group-id sg-1234567890 \
  --protocol tcp --port 443 --cidr 10.0.0.0/16

 Step 5: Check Network ACLs (NACLs)

NACLs are stateless:

twtech must explicitly allow both inbound and outbound directions for return traffic.

Issue

Common Misconfiguration

REJECT in Flow Logs

One direction missing in NACL

Timeouts

Return port range (1024–65535) blocked

Multi-tier app breaks

Inbound ephemeral ports not open

Sample Fix:
Inbound rule to allow return TCP traffic:

Rule#

Type

Protocol

Port Range

Source

Allow/Deny

100

Custom TCP

6

1024-65535

10.0.0.0/16

ALLOW

 Step 6: Validate Routing

Even with SG/NACL correct, route tables can silently drop packets if:

  •         Wrong route to IGW, NATGW, TGW, or endpoint.
  •         Private subnet trying to reach public IP without NAT.

Flow Logs Symptom:

  •        No logs generated (NODATA) — meaning packets never left the interface.

 6. Real Sample

Scenario: EC2 can’t connect to RDS in another subnet.

Flow Log

Meaning

REJECT for dstPort=3306

Connection to MySQL denied

SG Check

Ingress on RDS SG missing port 3306

Fix

Add inbound rule: MySQL (TCP 3306) from EC2’s SG

Result

Flow Log now shows ACCEPT

 7. Key Diagnostic Patterns

Flow Log Symptom

Root Cause

Resolution

REJECT ingress, dstPort 22

SSH blocked by SG/NACL

Add inbound TCP 22 rule

REJECT egress, srcPort ephemeral

NACL missing outbound 1024–65535

Allow ephemeral port range

NODATA

No traffic flow / route missing

Verify routes and connectivity

SKIPDATA

Log delivery failure

Recreate flow log or increase buffer size

 8. Tools to Automate Analysis

Tool

Purpose

CloudWatch Logs Insights

Query logs interactively

Athena on S3 Flow Logs

Historical analysis

AWS Detective / GuardDuty

Correlate security events

VPC Reachability Analyzer

Rule-by-rule path testing

 9. Visualization: SG/NACL Decision Flow


No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...