Amazon VPC Flow Logs - Overview & Hands-On.
Scope:
- The concept of VPC Flow Logs,
- Three levels of created,
- How VPC Flow Logs Work (Mechanics),
- Flow Log Record Structure (Sample flow log record syntax with default format, version 2),
- Breakdown of Flow Log Record Structure (Fields & Minings)
- Flow Log Status Values (Status & Meaning),
- Custom Log Formats (v3+) AWS allows custom fields & custom formats,
- Field to specify when creating or modifying a flow log (CLI),
- Security & Compliance Use Cases,
- Troubleshooting Use Cases,
- Cost & Performance Optimization,
- Advanced Integrations & Purposes,
- Flow Log vs. Other Network Monitoring Tools (CloudTrail & Traffic Mirroring),
- Best Practices,
- Sample Architecture Overview (Traffic Flow & Monitoring Path),
- Project: Hands-On.
1. The concept of VPC Flow Logs
- VPC Flow Logs capture information about the IP traffic going to and from network interfaces within the VPC (twtechvpc).
Three levels of created:
- VPC level:
- captures traffic for all ENIs (Elastic Network Interfaces) in the VPC.
- Subnet level:
- captures traffic for all ENIs in a subnet.
- Network Interface level:
- captures traffic for a specific ENI (e.g., EC2, ALB, Lambda in VPC, RDS, etc.).
2. How VPC Flow Logs
Work (Mechanics)
- Traffic Observation Point:
AWS intercepts metadata about IP packets at the ENI level—before traffic leaves or enters an ENI. - Log Generation:
Flow logs don’t capture packet payloads—only metadata like source/destination IP, port, protocol, bytes, and packets. - Publishing Destinations:
twtech can publish logs to: - Amazon CloudWatch Logs (for near
real-time monitoring)
- Amazon S3 (for
long-term storage, Athena queries, cost analysis)
- Amazon Kinesis Data
Firehose (for streaming to analytics tools like Splunk or Elastic)
- Delivery Model:
- Logs are batched and delivered every 1–10 minutes.
- Flow records aggregate multiple packets between the same source/destination pair during a defined time window.
3. Flow Log Record Structure (Sample flow log record syntax with default format, version 2)
2 123456789012 eni-abc123def456
10.0.1.5 54.240.196.186 49152 443 6 10 8400 1609459200 1609459260 ACCEPT OK
Breakdown of Flow Log Record Structure (Fields & Minings):
|
Field |
Meaning |
|
2 |
Log format version |
|
123456789012 |
AWS account ID |
|
eni-abc123def456 |
Network Interface ID |
|
10.0.1.5 |
Source IP |
|
54.240.196.186 |
Destination IP |
|
49152 |
Source port |
|
443 |
Destination port |
|
6 |
Protocol (6 = TCP, 17 = UDP, etc.) |
|
10 |
Packets |
|
8400 |
Bytes transferred |
|
1609459200 |
Start time (epoch seconds) |
|
1609459260 |
End time (epoch seconds) |
|
ACCEPT |
Action (ACCEPT or REJECT) |
|
OK |
Log status (OK, NODATA, SKIPDATA) |
4. Flow Log Status Values (Status & Meaning)
|
Status |
Meaning |
|
OK |
Data logging successfully completed |
|
NODATA |
No network traffic matched the
filter |
|
SKIPDATA |
Data could not be captured (e.g.,
due to internal error) |
5. Custom Log Formats (v3+) AWS allows custom fields & custom formats
- Subnet ID,
Instance ID, AZ, VPC ID
- TCP flags, type, pkt-src-aws-service, pkt-dst-aws-service
- Flow direction (ingress/egress),
- Interface type (e.g.,
Lambda, Transit Gateway),
# Field to specify when
creating or modifying a flow log (CLI)
# bash
aws ec2 create-flow-logs \
--resource-type VPC \
--resource-ids vpc-12345678 \
--traffic-type ALL \
--log-destination-type s3 \
--log-format '${version} ${vpc-id} ${interface-id} ${srcaddr} ${dstaddr} ${pkt-dst-aws-service}'
6. Security & Compliance Use Cases
- Detect anomalous traffic
(e.g., data exfiltration attempts, scanning patterns)
- Validate security group/NACL behavior
- Correlate with GuardDuty findings
- Investigate breaches
(who connected to what, when, and from where)
- Monitor cross-region or cross-account access
7. Troubleshooting Use
Cases
- EC2
instance can’t reach S3 or RDS
→ inspect flow logs for REJECT entries.
- Connectivity delays or
timeouts → look for
packet drops or asymmetric routing.
- Validate NAT Gateway and endpoint access.
8. Cost & Performance Optimization
- Analyze inter-AZ or
inter-VPC traffic to detect
cost spikes.
- Identify unused endpoints
or noisy workloads generating
excessive traffic.
- Detect internet-bound
traffic that could
use private endpoints.
Sample Athena query on Flow Logs in
S3
# sql
SELECT srcaddr, dstaddr, sum(bytes) as total_bytes
FROM vpc_flow_logs
WHERE action = 'ACCEPT'
GROUP BY srcaddr, dstaddr
ORDER BY total_bytes DESC
LIMIT 10;
9. Advanced Integrations & Purposes
|
Integration |
Purpose |
|
Amazon Athena |
Query S3 logs directly |
|
AWS Glue |
Catalog schema for flow logs |
|
CloudWatch Logs
Insights |
Real-time search |
|
Kinesis Firehose +
Elasticsearch/Splunk |
Advanced visualization |
|
Amazon GuardDuty |
Uses flow logs to detect threats |
10. Flow Log vs. Other Network Monitoring
Tools (CloudTrail & Traffic Mirroring)
|
Feature |
VPC
Flow Logs |
CloudTrail |
Traffic
Mirroring |
|
|
Data Type |
Metadata (no payloads) |
API activity logs |
Full packet capture |
|
|
Use Case |
Network monitoring, compliance |
Auditing AWS API calls |
Deep packet inspection |
|
|
Overhead |
Low |
Low |
High |
|
|
Storage |
S3 / CloudWatch |
S3 / CloudWatch |
Custom (PCAP tools) |
|
11. Best Practices
✅ Always enable Flow Logs (at least) at the VPC level for auditing.
✅ Use version 3 for maximum field flexibility.
✅ Send logs to S3 for long-term retention + Athena querying.
✅ Integrate with CloudWatch Logs Insights for quick analysis:
# bash
fields @timestamp,
srcAddr, dstAddr, action
| filter action="REJECT"
| sort @timestamp desc
NB:
- Automate analysis using AWS Lambda or Glue ETL for periodic summaries.
Sample Architecture Overview (Traffic Flow & Monitoring Path):
Project: Hands-On
- How twtech uses Amazon VPC Flow logs to capture information about the IP traffic going to and from network interfaces within the VPC (twtechvpc).
Search for AWS service: VPC
Step-1:
- Select the VPC (twtechVPC) and navigate to: Flow logs tab
Step-2:
- Create a flow Log
Create
flow log:
- Flow logs can capture IP traffic
flow information for the network interfaces associated with twtech resources.
- twtech can create multiple flow logs to send traffic to different destinations.
- Destionation: Send to an Amazon S3 bucket
- twtech must Create an S3 bucket for VPC Flow logs (S3-Destination): In the same Region where the VPC (twtechVPC) is located.
- Name must not contain: upper-case letters.
- Create bucket: twtechvpc-flow-logs
Step-3:
- twtech needs the bucket ARN to complete creating VPC Flow Log: twtech-flow-logs
Select the S3 bucket and click-open.
- Navigate to: Properties tab
Step-4:
- Return
to VPC console and insert the destination bucket ARN (Amozon Resource Name) copied: arn:aws:s3:::twtechvpc-flow-log
NB:
- Default AWS Flow logs format:
${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}
Create flow logs:
Step-5
- How twtech verifies the details of its VPC flow logs created
Step-6:
- twtech Creates another VPC Flow Logs for CloudWatch logs (CloudWatch-Destination): twtechVPC-CloudWatch-Logs
Steps in creating a Destination log group for: CloudWatch
NB:
- twtech does not use the name of an existing log group.
- The name of a new log group will be created when twtech creates flow log.
- A new log stream is created for each monitored network interface.
Search for AWS service: Cloudwatch
Create Log Groups From: Logs menu / log groups
Create log group:
Create
log group:
Step-7:
- twtech Returns to flow logs settings console (UI):
- Refresh from Destination icon,
- Then, select the log group name created: twtechVPC-CloudWatch-FlowLogs
- Create flow logs:
Step-8:
- twtech accesses VPC flow logs details in a specific VPC: twtechVPC
NB:
- Explanation of Destionation for VPC Flow logs created:
- As shown on the image above, twtech has configured Logs to flow in two Directions:
- Cloudwatch,
- S3 bucket,
Step-9:
- How twtech accesses its timestamp log from S3 bucket: twtechvpc-flow-logs
- twtech Returns to S3 console (UI),
- refresh
- Then, verifies if directories (folders) are being created recursively (one-inside-the-other):
Yes:
NB:
- Clicking deep into the folders, twtech can at this point access the timestamp logs for its VPC: twtechVPC flow logs in S3 bucket
Step-10:
- How twtech accesses VPC Flow logs from: CloudWatch Logs
Search for AWS service: CloudWatch
Navigate to menus: Logs / Logs groups
NB:
- Clicking deep into the folders, twtech can now access the timestamp logs for its VPC: twtechVPC flow logs in CloudWatch
Logs --> Log groups --> Log Stream ---> Timestamp Log Events
NB:
- From Log Event twtech can block any IP address that is repeatedly trying to access its resources at: NACL Level
- From Log Event twtech can perform a fine-grained filter out for: errors, exceptions and many more:
- No error found yet in the VPC (twtecgVPC)
Traffic flow explanation:
- The Elastic Network interface (ENI) of: CloudWatch - Log Streams
NB:
- The Elastic Network interface (ENI) of CloudWatch - Log Streams corresponds to the ENI with twtech-account:
- twtech continuously Monitor (monitoring & Observability) that Resource ENI ID is that same as that found in the Log Events:
NB:
VPC Flow logs are going to both S3 and CloudWatch, look the Same.
Step-11:
- How twtech uses AWS Athena (for big Analysis) to Query VPC data (Flow – logs) from s3 buckect:
Search for AWS Service: Athena (for big Analysis)
- twtech step-by-step to setup AWS Athena (To Query logs from Destionations for real-time Analytics) is found on twtech Blog at the following link:
https://think-with-tech.blogspot.com/2025/08/amazon-athena-deep-dive.html
NB:
- Amazon Athena is a serverless, interactive query service that lets twtech run SQL queries directly on data stored in Amazon S3
Set up a query location in Amazon s3 by editing the settings
Create Location for S3 bucket to Store Queries from S3 console: twtech-athena-bucket
Create S3 bucket: twtech-athena-bucket
Acess the bucket Properties and copy the ARN: Select the bucket and click open
Navigate to Property tab and copy the ARN: arn:aws: s3:::twtech-athena
Return to Athena consule to : Manage settings
- Sample Query location: s3://twtech-athena-bucket/athena
Save Athena settings:
Setp-13:
- twtech needs to create a database on Athena:
- google : aws vpc flow logs Athena
- This is a documentation guide to create a table to query data from s3: twtech-athena-bucket
Link:
https://docs.aws.amazon.com/athena/latest/ug/vpc-flow-logs-create-table-statement.html
- twtech needs to enter a DDL (Data Definition Language) statement like the following into the Athena console query editor, following the guidelines in the Considerations and limitations section.
- The sample statement creates a table that has the columns for Amazon VPC flow logs versions 2 through 5 as documented in Flow log records.
- If twtech uses a different set of columns or order of columns, then, there is need to modify the statement accordingly.
- # Copy the syntax and use to create Athena VPC flow logs table:
# syntax (statement) –create-table-in-athena
CREATE
EXTERNAL TABLE IF NOT EXISTS `vpc_flow_logs` (
version int,
account_id string,
interface_id string,
srcaddr string,
dstaddr string,
srcport int,
dstport int,
protocol bigint,
packets bigint,
bytes bigint,
start bigint,
`end` bigint,
action string,
log_status string,
vpc_id string,
subnet_id string,
instance_id string,
tcp_flags int,
type string,
pkt_srcaddr string,
pkt_dstaddr string,
region string,
az_id string,
sublocation_type string,
sublocation_id string,
pkt_src_aws_service string,
pkt_dst_aws_service string,
flow_direction string,
traffic_path int
)
PARTITIONED
BY (`date` date)
ROW
FORMAT DELIMITED
FIELDS
TERMINATED BY ' '
LOCATION
s3://twtechvpc-flow-logs/AWSLogs/accountID/vpcflowlogs/us-east-2/2025/11/11/'
TBLPROPERTIES ("skip.header.line.count"="1");
#
Reture to Athena UI: Paste the Syntax to create table
- twtech needs Specify where the data need to be queried on the synthax:
From:
- To
specified location VPC Flow logs bucket (S3
bucket), twtech needs to go to bucket console and copy
the: VPC flow-Logs S3 URI
- twtech
needs Continue clicking on the bucket to it get to the bucket properties (VPC-flow-logs): S3URI
- Copy the S3URI and insert in the syntax to create table:
Sample S3 URI:
s3://twtechvpc-flow-logs/AWSLogs/accountID/vpcflowlogs/us-east-2/2025/11/11/
Next: run the syntax (statement)
Query: Successful
twtech Explanation:
- The query completed successfully and a table with partitions: vpc_flow_logs
How twtech accesses the table partitions:
Again twtech needs to run another Syntax (stateme):
- To create partitions to be able to read the data, as in the following sample query.
- This query creates a single partition for a specified date. Replace the placeholders for date and location as needed.
NB:
- Syntax can still be got from aws official documentation link:
- Copy
syntax (statement) and insert the vpc-flow-log-bucket URI again: S3 URI
NB:
The syntax has adjustments for:
From:
ALTER
TABLE vpc_flow_logs
ADD
PARTITION (`date`='YYYY-MM-dd')
LOCATION
's3://amzn-s3-demo-bucket/prefix/AWSLogs/{account_id}/vpcflowlogs/{region_code}/YYYY/MM/dd';
- twtech needs to Continue clicking on the bucket to it get to the bucket properties again (VPC-flow-logs): S3URI
To: Configured
# syntax (statement)-create-partion-in-table
ALTER
TABLE vpc_flow_logs
ADD
PARTITION (`date`='2025-11-11')
LOCATION 's3://twtechvpc-flow-logs/AWSLogs/accountID/vpcflowlogs/us-east-2/2025/11/11/';
Run the syntax (statement) as well:
Query: Successful added one Partition into the table
From
the same documentation link:
https://docs.aws.amazon.com/athena/latest/ug/vpc-flow-logs-create-table-statement.html
- twtech
can query with the following syntax (statement): To lists
all of the rejected TCP connections (traffic) and
uses the newly created date partition column,
date, to extract from it the day of the week for which these events occurred.
# syntax (statement)- lists all of the rejected TCP connections (traffic)
SELECT
day_of_week(date) AS
day,
date,
interface_id,
srcaddr,
action,
protocol
FROM
vpc_flow_logs
WHERE
action = 'REJECT' AND protocol = 6
LIMIT 100;
Copy the syntax (statement) and run in Athena:
Query:
Successful
twtech
Explanation:
The
Query to list all rejected traffic is
a security strategy to
monitor the vpc for the following
- who
is trying to access twtechVPC most,
- Table
includes:
o Day,
o Date,
o Interface_id,
o Srcaddr (source_address),
o Action (taken by twtechvpc),
o Protocol
Mitigation Strategy from Athena (real-time data Querying):
- twtech can block the IP addresses attacking its VPC (twtechVPC).
No comments:
Post a Comment