Friday, November 7, 2025

Troubleshooting Security Group (SG) & Network Access Control List (NACL) with VPC Flow Logs | Overview.

Troubleshooting  Security Group (SG) & Network Access Control List (NACL)  with  VPC Flow Logs - Overview.

Scope:

  •       The Context of Where VPC Flow Logs Fit in,
  •       VPC Packet Flow Review (Order of Evaluation),
  •       What VPC Flow Logs Show (sample flow log line, Fields Meaing),
  •       Interpreting ACCEPT vs REJECT (Log Action, Meaning, Common Causes),
  •       Step-by-Step Troubleshooting Flow,
  •      Step A: Identify the Traffic Pattern (Troubleshooting),
  •      Step B: Check the Flow Log Entry (Troubleshooting),
  •      Step C: Determine Direction (Troubleshooting),
  •      Step D: Check Security Group Rules (Troubleshooting),
  •      Step E: Check Network ACLs (NACLs) Troubleshooting,
  •      Sample Fix for Inbound rule to allow return TCP traffic,
  •      Step F: Validate Routing Even with correct SG/NACL (route tables can silently drop packets) Troubleshooting,
  •      Flow Logs Symptom,
  •      Real Sample Scenario where EC2 can’t connect to RDS in another subnet (Flow logs and Mining),
  •      Key Diagnostic Patterns (Flow log Symptoms, Root Causes Resolutions),
  •      Tools to Automate Analysis & Purposes,
  •      Visualization Flow diagram for SG/NACL Decision. 

 1. The Context of Where VPC Flow Logs Fit in

    •  When traffic fails (connectivity drops, timeouts, or access denials), it’s often due to SG or NACL rules.
    •  VPC Flow Logs give network-level visibility of traffic allowed or denied. 
    • Network-level visibility is critical for diagnosing the following:

      •  SG misconfigurations (wrong CIDR, port, or direction)
      •  NACL rule conflicts
      •  Missing routes or private endpoint setups.

 2. VPC Packet Flow Review (Order of Evaluation)

    • Before looking at logs, recall the order in which AWS evaluates network access:

Step

Component

Direction

Key Rule Type

Stateful?

1

Security Group

Inbound / Outbound

Allow only

Yes

2

Network ACL

Inbound / Outbound

Allow or Deny

No

3

Route Table

Outbound only

Routes traffic

N/A

4

VPC Flow Logs

Both directions

Capture result (ACCEPT/REJECT)

N/A

NB:

    • Security Groups first (instance-level),then NACLs (subnet-level).
    • If either denies traffic, twtech will see a REJECT in Flow Logs.

 3. What VPC Flow Logs Show (sample flow log line, Fields & Meaing):

2 123456789012 eni-abc123def456 10.0.1.10 172.31.5.20 52000 443 6 3 1800 1609459200 1609459260 REJECT OK

Field

Meaning

10.0.1.10

Source IP

172.31.5.20

Destination IP

52000

Source Port

443

Destination Port

6

Protocol (6 = TCP)

REJECT

Denied by SG or NACL

 4. Interpreting ACCEPT vs REJECT (Log Action, Meaning, Common Causes)

Log Action

Meaning

Common Cause

ACCEPT

Packet allowed through both SG + NACL

Correct rules

REJECT

Packet denied at SG or NACL

Misconfigured rule or direction

NB:

    • Flow Logs don’t directly say which layer (SG/NACL) caused the denial — twtech infers it using packet direction and rules.

 5. Step-by-Step Troubleshooting Flow

 Step A: Identify the Traffic Pattern (Troubleshooting)

    • Source IP, destination IP, and port.
    • Direction: inbound (response to external request) or outbound (initiated from instance).

 Step B: Check the Flow Log Entry (Troubleshooting)

    • Look for entries with action=REJECT for those IPs/ports.
    • Example query (CloudWatch Logs Insights):

# bash
fields @timestamp, interfaceId, srcAddr, dstAddr, srcPort, dstPort, action
| filter action="REJECT"
| sort @timestamp desc

 Step C: Determine Direction (Troubleshooting)

    • If source is your instance IP, and destination is outside Outbound.
    • If destination is your instance IP Inbound.

 Step D: Check Security Group Rules (Troubleshooting)

SGs are stateful:

    • If outbound is allowed, return inbound traffic is automatically allowed.

Scenario

Check

Outbound to Internet blocked

Verify egress SG rule (0.0.0.0/0, TCP 443/80)

Inbound connection fails

Verify ingress SG allows source IP/CIDR and port

Peer EC2 access fails

Both SGs must reference each other or open required ports

Sample Fix with AWS CLI:

# bash 
aws ec2 authorize-security-group-ingress \
  --group-id sg-1234567890 \
  --protocol tcp --port 443 --cidr 10.0.0.0/16

 Step E: Check Network ACLs (NACLs) (Troubleshooting)

NACLs are stateless:

    • twtech must explicitly allow both inbound and outbound directions for return traffic.

Issue

Common Misconfiguration

REJECT in Flow Logs

One direction missing in NACL

Timeouts

Return port range (1024–65535) blocked

Multi-tier app breaks

Inbound ephemeral ports not open

Sample Fix for Inbound rule to allow return TCP traffic:

Rule#

Type

Protocol

Port Range

Source

Allow/Deny

100

Custom TCP

6

1024-65535

10.0.0.0/16

ALLOW

 Step F: Validate Routing Even with correct  SG/NACL (route tables can silently drop packets) Troubleshooting if:

    •  Wrong route to IGW, NATGW, TGW, or endpoint.
    •   Private subnet trying to reach public IP without NAT.
Flow Logs Symptom:

    •  No logs generated (NODATA)
      • Meaning packets never left the interface.

 6. Real Sample Scenario where EC2 can’t connect to RDS in another subnet (Flow logs and Mining)

Flow Log

Meaning

REJECT for dstPort=3306

Connection to MySQL denied

SG Check

Ingress on RDS SG missing port 3306

Fix

Add inbound rule: MySQL (TCP 3306) from EC2’s SG

Result

Flow Log now shows ACCEPT

 7. Key Diagnostic Patterns (Flow log Symptoms, Root Causes & Resolutions)

Flow Log Symptom

Root Cause

Resolution

REJECT ingress, dstPort 22

SSH blocked by SG/NACL

Add inbound TCP 22 rule

REJECT egress, srcPort ephemeral

NACL missing outbound 1024–65535

Allow ephemeral port range

NODATA

No traffic flow / route missing

Verify routes and connectivity

SKIPDATA

Log delivery failure

Recreate flow log or increase buffer size

 8. Tools to Automate Analysis & Purposes

Tool

Purpose

CloudWatch Logs Insights

Query logs interactively

Athena on S3 Flow Logs

Historical analysis

AWS Detective / GuardDuty

Correlate security events

VPC Reachability Analyzer

Rule-by-rule path testing

 9. Visualization Flow diagram for SG/NACL Decision 






No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...