Troubleshooting Security Group (SG) & Network Access Control List (NACL) with VPC Flow Logs - Overview.
Scope:
- The Context of Where VPC Flow Logs Fit in,
- VPC Packet Flow Review (Order of Evaluation),
- What VPC Flow Logs Show (sample flow log line, Fields & Meaing),
- Interpreting ACCEPT vs REJECT (Log Action, Meaning, Common Causes),
- Step-by-Step Troubleshooting Flow,
- Step A: Identify the Traffic Pattern (
Troubleshooting), - Step B: Check the Flow Log Entry
( Troubleshooting), - Step C: Determine Direction
( Troubleshooting), - Step D: Check Security Group Rules
( Troubleshooting), - Step E: Check Network ACLs (NACLs) Troubleshooting,
- Sample Fix for Inbound rule to allow return TCP traffic,
- Step F: Validate Routing Even with correct SG/NACL (route tables can silently drop packets) Troubleshooting,
- Flow Logs Symptom,
- Real Sample Scenario where EC2 can’t connect to RDS in another subnet (Flow logs and Mining),
- Key Diagnostic Patterns (Flow log Symptoms, Root Causes & Resolutions),
- Tools to Automate Analysis & Purposes,
- Visualization Flow diagram for SG/NACL Decision.
1. The Context of Where VPC Flow Logs Fit in
- When
traffic fails (connectivity
drops, timeouts, or access denials), it’s often due to SG or NACL rules.
- VPC Flow Logs give network-level visibility of traffic allowed or denied.
- Network-level visibility is critical for diagnosing the following:
- SG misconfigurations (wrong CIDR, port, or direction)
- NACL
rule conflicts
- Missing
routes or private endpoint setups.
2.
VPC Packet Flow Review (Order of Evaluation)
- Before looking at logs, recall the order in which AWS evaluates network access:
|
Step |
Component |
Direction |
Key Rule Type |
Stateful? |
|
1 |
Security Group |
Inbound / Outbound |
Allow only |
✅ Yes |
|
2 |
Network ACL |
Inbound / Outbound |
Allow or Deny |
❌ No |
|
3 |
Route Table |
Outbound only |
Routes traffic |
N/A |
|
4 |
VPC Flow Logs |
Both directions |
Capture result (ACCEPT/REJECT) |
N/A |
NB:
- Security Groups first (instance-level),then NACLs
(subnet-level).
- If
either denies traffic, twtech will see a
REJECTin Flow Logs.
3. What VPC Flow Logs Show (sample flow log line, Fields & Meaing):
2 123456789012 eni-abc123def456 10.0.1.10 172.31.5.20 52000 443 6 3 1800 1609459200 1609459260 REJECT OK
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4. Interpreting ACCEPT vs REJECT (Log Action, Meaning, Common Causes)
|
Log Action |
Meaning |
Common Cause |
|
ACCEPT |
Packet allowed through both SG + NACL |
Correct rules |
|
REJECT |
Packet denied at SG or NACL |
Misconfigured rule or direction |
NB:
- Flow Logs don’t directly say which layer (SG/NACL) caused the denial — twtech infers it using packet direction and rules.
5. Step-by-Step
Troubleshooting Flow
Step A: Identify the Traffic Pattern ( Troubleshooting)
- Source IP, destination IP, and port.
- Direction: inbound (response to external request) or outbound (initiated from instance).
Step B: Check the Flow Log Entry ( Troubleshooting)
- Look for entries with
action=REJECTfor those IPs/ports. - Example query (CloudWatch Logs Insights):
# bashfields @timestamp, interfaceId, srcAddr, dstAddr, srcPort, dstPort, action| filter action="REJECT"| sort @timestamp desc
Step C: Determine Direction ( Troubleshooting)
- If source is your instance IP,
and destination is outside → Outbound.
- If destination is your instance IP → Inbound.
Step D: Check
Security Group Rules ( Troubleshooting)
SGs are stateful:
- If outbound is allowed, return inbound traffic is automatically allowed.
|
Scenario |
Check |
|
Outbound to Internet blocked |
Verify egress SG rule (0.0.0.0/0, TCP 443/80) |
|
Inbound connection fails |
Verify ingress SG allows source IP/CIDR and port |
|
Peer EC2 access fails |
Both SGs must reference each other or open required ports |
Sample
Fix with AWS CLI:
# bash aws ec2authorize-security-group-ingress \--group-idsg-1234567890\--protocol tcp --port 443 --cidr10.0.0.0/16
Step E: Check
Network ACLs (NACLs) ( Troubleshooting)
NACLs are stateless:
- twtech must explicitly allow both inbound and outbound directions for return traffic.
|
Issue |
Common Misconfiguration |
|
REJECT in Flow Logs |
One direction missing in NACL |
|
Timeouts |
Return port range (1024–65535) blocked |
|
Multi-tier app breaks |
Inbound ephemeral ports not open |
Sample
Fix for Inbound rule to
allow return TCP traffic:
|
Rule# |
Type |
Protocol |
Port Range |
Source |
Allow/Deny |
|
100 |
Custom TCP |
6 |
1024-65535 |
10.0.0.0/16 |
ALLOW |
Step F: Validate Routing Even with correct SG/NACL (route tables can silently drop packets) Troubleshooting if:
- Wrong route to IGW, NATGW, TGW, or endpoint.
- Private subnet trying to reach public IP without NAT.
- No logs generated (
NODATA) - Meaning packets never left the interface.
6. Real Sample Scenario where EC2 can’t connect to RDS in another subnet (Flow logs and Mining)
|
Flow Log |
Meaning |
|
|
Connection to MySQL denied |
|
SG Check |
Ingress on RDS SG missing port 3306 |
|
Fix |
Add inbound rule: MySQL (TCP
3306) from EC2’s SG |
|
Result |
Flow Log now shows |
7. Key Diagnostic Patterns (Flow log Symptoms, Root Causes & Resolutions)
|
Flow Log Symptom |
Root Cause |
Resolution |
|
|
SSH blocked by SG/NACL |
Add inbound TCP 22 rule |
|
|
NACL missing outbound 1024–65535 |
Allow ephemeral port range |
|
|
No traffic flow / route missing |
Verify routes and connectivity |
|
|
Log delivery failure |
Recreate flow log or increase buffer size |
8. Tools to Automate Analysis & Purposes
|
Tool |
Purpose |
|
CloudWatch Logs Insights |
Query logs interactively |
|
Athena on S3 Flow Logs |
Historical analysis |
|
AWS Detective / GuardDuty |
Correlate security events |
|
VPC Reachability Analyzer |
Rule-by-rule path testing |
No comments:
Post a Comment