A dive deep into troubleshooting Security Group (SG) and Network ACL (NACL) issues
using Amazon VPC Flow Logs.
Focus:
- Step-by-step, from packet flow analysis to interpreting log entries for root-cause detection.
Breakdown:
- The Context: Where VPC Flow Logs Fit in,
- VPC Packet Flow Review (Order of Evaluation),
- What VPC Flow Logs Show,
- Interpreting ACCEPT vs REJECT,
- Step-by-Step Troubleshooting Flow,
- Real Sample,
- Key Diagnostic Patterns,
- Tools to Automate Analysis,
- Visualization: SG/NACL Decision Flow.
1. The Context Where VPC Flow Logs Fit in
- When
traffic fails (connectivity
drops, timeouts, or access denials), it’s often due to SG or NACL rules.
- VPC Flow Logs give network-level
visibility of traffic allowed or denied — critical for
diagnosing:
- SG misconfigurations (wrong CIDR, port, or direction)
- NACL
rule conflicts
- Missing
routes or private endpoint setups
2.
VPC Packet Flow Review (Order of Evaluation)
- Before looking at logs, recall the order in which AWS evaluates network access:
|
Step |
Component |
Direction |
Key Rule Type |
Stateful? |
|
1 |
Security Group |
Inbound / Outbound |
Allow only |
✅ Yes |
|
2 |
Network ACL |
Inbound / Outbound |
Allow or Deny |
❌ No |
|
3 |
Route Table |
Outbound only |
Routes traffic |
N/A |
|
4 |
VPC Flow Logs |
Both directions |
Capture result (ACCEPT/REJECT) |
N/A |
NB:
- Security Groups first (instance-level),then NACLs
(subnet-level).
- If
either denies traffic, twtech will see a
REJECTin Flow Logs.
3. What VPC Flow
Logs Show
A
sample flow log line:
2 123456789012 eni-abc123def456 10.0.1.10 172.31.5.20 52000 443 6 3 1800 1609459200 1609459260 REJECT OK
|
Field |
Meaning |
|
|
Source IP |
|
|
Destination IP |
|
|
Source Port |
|
|
Destination Port |
|
|
Protocol (6 = TCP) |
|
|
Denied by SG or NACL |
4. Interpreting ACCEPT vs REJECT
|
Log Action |
Meaning |
Common Cause |
|
ACCEPT |
Packet allowed through both SG + NACL |
Correct rules |
|
REJECT |
Packet denied at SG or NACL |
Misconfigured rule or direction |
NB:
- Flow Logs don’t
directly say which layer (SG/NACL)
caused the denial — twtech infers it using packet direction and rules.
5. Step-by-Step
Troubleshooting Flow
Step 1: Identify the Traffic Pattern
- Source IP, destination IP, and port.
- Direction: inbound (response
to external request) or outbound (initiated from instance).
Step 2: Check the Flow Log Entry
- Look for entries with
action=REJECTfor those IPs/ports. - Example query (CloudWatch
Logs Insights):
# bashfields @timestamp, interfaceId, srcAddr, dstAddr, srcPort, dstPort, action| filter action="REJECT"| sort @timestamp desc
Step 3: Determine Direction
- If source is your instance IP,
and destination is outside → Outbound.
- If destination is your instance IP → Inbound.
Step 4: Check
Security Group Rules
SGs are stateful:
- If outbound is allowed, return inbound traffic is automatically allowed.
|
Scenario |
Check |
|
Outbound to Internet blocked |
Verify egress SG rule (0.0.0.0/0, TCP 443/80) |
|
Inbound connection fails |
Verify ingress SG allows source IP/CIDR and port |
|
Peer EC2 access fails |
Both SGs must reference each other or open required ports |
Sample
Fix:
# bash aws ec2 authorize-security-group-ingress \ --group-id sg-1234567890 \ --protocol tcp --port 443 --cidr 10.0.0.0/16
Step 5: Check
Network ACLs (NACLs)
NACLs are stateless:
twtech must explicitly allow both
inbound and outbound directions for return traffic.
|
Issue |
Common Misconfiguration |
|
REJECT in Flow Logs |
One direction missing in NACL |
|
Timeouts |
Return port range (1024–65535) blocked |
|
Multi-tier app breaks |
Inbound ephemeral ports not open |
Sample
Fix:
Inbound rule to
allow return TCP traffic:
|
Rule# |
Type |
Protocol |
Port Range |
Source |
Allow/Deny |
|
100 |
Custom TCP |
6 |
1024-65535 |
10.0.0.0/16 |
ALLOW |
Step 6: Validate
Routing
Even
with SG/NACL correct, route tables can silently drop packets if:
- Wrong route to IGW, NATGW, TGW, or endpoint.
- Private subnet trying to reach public IP without NAT.
Flow Logs Symptom:
- No logs generated (
NODATA) — meaning packets never left the interface.
6. Real Sample
Scenario: EC2 can’t connect to RDS in another subnet.
|
Flow Log |
Meaning |
|
|
Connection to MySQL denied |
|
SG Check |
Ingress on RDS SG missing port 3306 |
|
Fix |
Add inbound rule: MySQL (TCP
3306) from EC2’s SG |
|
Result |
Flow Log now shows |
7. Key Diagnostic Patterns
|
Flow Log Symptom |
Root Cause |
Resolution |
|
|
SSH blocked by SG/NACL |
Add inbound TCP 22 rule |
|
|
NACL missing outbound 1024–65535 |
Allow ephemeral port range |
|
|
No traffic flow / route missing |
Verify routes and connectivity |
|
|
Log delivery failure |
Recreate flow log or increase buffer size |
8. Tools to Automate Analysis
|
Tool |
Purpose |
|
CloudWatch Logs Insights |
Query logs interactively |
|
Athena on S3 Flow Logs |
Historical analysis |
|
AWS Detective / GuardDuty |
Correlate security events |
|
VPC Reachability Analyzer |
Rule-by-rule path testing |
No comments:
Post a Comment