Friday, November 7, 2025

Amazon VPC Flow Logs | Overview & Hands-On.

Amazon VPC Flow Logs - Overview & Hands-On.

Scope:

  • The concept of VPC Flow Logs,
  • Three levels of created,
  •  How VPC Flow Logs Work (Mechanics),
  •  Flow Log Record Structure (Sample flow log record syntax with default format, version 2),
  •  Breakdown of Flow Log Record Structure (Fields & Minings)
  •  Flow Log Status Values (Status Meaning),
  • Custom Log Formats (v3+) AWS allows custom fields & custom formats,
  • Field to specify when creating or modifying a flow log (CLI),
  • Security & Compliance Use Cases,
  • Troubleshooting Use Cases,
  • Cost & Performance Optimization,
  • Advanced Integrations & Purposes,
  • Flow Log vs. Other Network Monitoring Tools (CloudTrail & Traffic Mirroring),
  • Best Practices,
  • Sample Architecture Overview (Traffic Flow & Monitoring Path),
  • Project: Hands-On.

1. The concept of VPC Flow Logs

    • VPC Flow Logs capture information about the IP traffic going to and from network interfaces within the VPC (twtechvpc).

Three levels of created:

    • VPC level:
      • captures traffic for all ENIs (Elastic Network Interfaces) in the VPC.
    • Subnet level:
      • captures traffic for all ENIs in a subnet.
    • Network Interface level:
      •  captures traffic for a specific ENI (e.g., EC2, ALB, Lambda in VPC, RDS, etc.).

 2. How VPC Flow Logs Work (Mechanics)

  1. Traffic Observation Point:
    AWS intercepts metadata about IP packets at the ENI level—before traffic leaves or enters an ENI.
  2. Log Generation:
    Flow logs don’t capture packet payloads—only metadata like source/destination IP, port, protocol, bytes, and packets.
  3. Publishing Destinations:
    twtech can publish logs to:
    • Amazon CloudWatch Logs (for near real-time monitoring)
    • Amazon S3 (for long-term storage, Athena queries, cost analysis)
    • Amazon Kinesis Data Firehose (for streaming to analytics tools like Splunk or Elastic)
  4. Delivery Model:

    • Logs are batched and delivered every 1–10 minutes.
    • Flow records aggregate multiple packets between the same source/destination pair during a defined time window.

 3. Flow Log Record Structure (Sample flow log record syntax with default format, version 2)

2 123456789012 eni-abc123def456 10.0.1.5 54.240.196.186 49152 443 6 10 8400 1609459200 1609459260 ACCEPT OK

Breakdown of Flow Log Record Structure (Fields & Minings):

Field

Meaning

2

Log format version

123456789012

AWS account ID

eni-abc123def456

Network Interface ID

10.0.1.5

Source IP

54.240.196.186

Destination IP

49152

Source port

443

Destination port

6

Protocol (6 = TCP, 17 = UDP, etc.)

10

Packets

8400

Bytes transferred

1609459200

Start time (epoch seconds)

1609459260

End time (epoch seconds)

ACCEPT

Action (ACCEPT or REJECT)

OK

Log status (OK, NODATA, SKIPDATA)

4. Flow Log Status Values (Status & Meaning)

Status

Meaning

OK

Data logging successfully completed

NODATA

No network traffic matched the filter

SKIPDATA

Data could not be captured (e.g., due to internal error)

 5. Custom Log Formats (v3+) AWS allows custom fields & custom formats

    • Subnet ID, Instance ID, AZ, VPC ID
    • TCP flags, type, pkt-src-aws-service, pkt-dst-aws-service
    • Flow direction (ingress/egress),
    • Interface type (e.g., Lambda, Transit Gateway),

# Field to specify when creating or modifying a flow log (CLI)

# bash

aws ec2 create-flow-logs \

  --resource-type VPC \

  --resource-ids vpc-12345678 \

  --traffic-type ALL \

  --log-destination-type s3 \

  --log-format '${version} ${vpc-id} ${interface-id} ${srcaddr} ${dstaddr} ${pkt-dst-aws-service}' 

 6. Security & Compliance Use Cases

    • Detect anomalous traffic (e.g., data exfiltration attempts, scanning patterns)
    • Validate security group/NACL behavior
    • Correlate with GuardDuty findings
    • Investigate breaches (who connected to what, when, and from where)
    • Monitor cross-region or cross-account access

 7. Troubleshooting Use Cases

    • EC2 instance can’t reach S3 or RDS inspect flow logs for REJECT entries.
    • Connectivity delays or timeouts look for packet drops or asymmetric routing.
    • Validate NAT Gateway and endpoint access.

 8. Cost & Performance Optimization

    • Analyze inter-AZ or inter-VPC traffic to detect cost spikes.
    • Identify unused endpoints or noisy workloads generating excessive traffic.
    • Detect internet-bound traffic that could use private endpoints.

Sample Athena query on Flow Logs in S3 

# sql

SELECT srcaddr, dstaddr, sum(bytes) as total_bytes

FROM vpc_flow_logs

WHERE action = 'ACCEPT'

GROUP BY srcaddr, dstaddr

ORDER BY total_bytes DESC

LIMIT 10;

 9. Advanced Integrations & Purposes

Integration

Purpose

Amazon Athena

Query S3 logs directly

AWS Glue

Catalog schema for flow logs

CloudWatch Logs Insights

Real-time search

Kinesis Firehose + Elasticsearch/Splunk

Advanced visualization

Amazon GuardDuty

Uses flow logs to detect threats

 10. Flow Log vs. Other Network Monitoring Tools (CloudTrail & Traffic Mirroring)

Feature

VPC Flow Logs

CloudTrail

Traffic Mirroring

Data Type

Metadata (no payloads)

API activity logs

Full packet capture

Use Case

Network monitoring, compliance

Auditing AWS API calls

Deep packet inspection

Overhead

Low

Low

High

Storage

S3 / CloudWatch

S3 / CloudWatch

Custom (PCAP tools)

 11. Best Practices

   Always enable Flow Logs (at least) at the VPC level for auditing.
   Use version 3 for maximum field flexibility.
  Send logs to S3 for long-term retention + Athena querying.
  Integrate with CloudWatch Logs Insights for quick analysis:

# bash

fields @timestamp, srcAddr, dstAddr, action

| filter action="REJECT"

| sort @timestamp desc

NB:

    • Automate analysis using AWS Lambda or Glue ETL for periodic summaries.

 Sample Architecture Overview (Traffic Flow & Monitoring Path):



Project: Hands-On

  • How twtech uses Amazon VPC Flow logs to capture information about the IP traffic going to and from network interfaces within the VPC (twtechvpc).

Search for AWS service: VPC

Step-1:

  • Select the VPC (twtechVPC) and navigate to: Flow logs tab

Step-2:

  • Create a flow Log

Create flow log:

NB:

    • Flow logs can capture IP traffic flow information for the network interfaces associated with twtech resources.
    • twtech can create multiple flow logs to send traffic to different destinations.

  • Destionation: Send to an Amazon S3 bucket

    • twtech must Create an S3 bucket for VPC Flow logs (S3-Destination): In the same Region where the VPC (twtechVPC) is located.
NB:
  • Name must not contain: upper-case letters.


  • Create bucket: twtechvpc-flow-logs

Step-3:

    • twtech needs the bucket ARN to complete creating VPC Flow Log: twtech-flow-logs

Select the S3 bucket and click-open.

  • Navigate to: Properties tab

Step-4:

    • Return to VPC console and insert the destination bucket ARN (Amozon Resource Name) copied: arn:aws:s3:::twtechvpc-flow-log

NB:

  • Default AWS Flow logs format:

${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}

Create flow logs:

Step-5

  • How twtech verifies the details of  its VPC flow logs created

Step-6: 

  • twtech Creates another VPC Flow Logs  for CloudWatch logs (CloudWatch-Destination): twtechVPC-CloudWatch-Logs


Steps in creating a Destination log group for: CloudWatch

NB:

    • twtech does not use the name of an existing log group.
    • The name of a new log group will be created when twtech creates flow log.
    • A new log stream is created for each monitored network interface.

Search for AWS service: Cloudwatch

Create Log Groups From: Logs menu / log groups

Create log group:

Create log group:

Step-7:

  • twtech Returns to flow logs settings console (UI):
    •  Refresh from Destination icon, 
      • Then, select the log group name created: twtechVPC-CloudWatch-FlowLogs

  • Create flow logs:

Step-8:

    • twtech accesses VPC flow logs details in a specific VPC: twtechVPC

NB:

  • Explanation of Destionation for VPC Flow logs created:
    • As shown  on the image above, twtech has configured Logs to flow  in two Directions:

      • Cloudwatch, 
      • S3 bucket,

Step-9:

    • How twtech accesses its timestamp log from S3 bucket: twtechvpc-flow-logs
    • twtech Returns to S3 console (UI),
      • refresh 
        • Then, verifies if directories (folders) are being created recursively (one-inside-the-other):

Yes:

NB:

    • Clicking deep into the folders, twtech can at this point access the timestamp logs for its VPC: twtechVPC flow logs in S3 bucket

Step-10:

  • How twtech accesses VPC Flow logs from: CloudWatch Logs

 Search for AWS service: CloudWatch

Navigate to menus:  Logs / Logs groups

NB:

  • Clicking deep into the folders, twtech can now access the timestamp logs for its VPC: twtechVPC flow logs in CloudWatch

Logs --> Log groups --> Log Stream ---> Timestamp Log Events

NB:

    •  From Log Event twtech can block any IP address that is repeatedly trying to access its resources at: NACL Level
    •  From Log Event twtech can perform a fine-grained filter out for: errors, exceptions and many more:

  • No error found yet in the VPC (twtecgVPC)

Traffic flow explanation:

  • The Elastic Network interface (ENI) of: CloudWatch - Log Streams

NB:

    • The Elastic Network interface (ENI) of CloudWatch - Log Streams corresponds to the ENI with twtech-account:
    •  twtech continuously Monitor (monitoring & Observability)  that Resource ENI ID is that same as that found in the Log Events:

NB:

VPC Flow logs are going to both S3 and CloudWatch, look the Same.

Step-11:

    • How twtech uses AWS Athena (for big Analysis) to Query VPC data (Flow – logs) from s3 buckect:

Search for AWS Service: Athena (for big Analysis)

  • twtech step-by-step to setup AWS Athena (To Query logs from Destionations for real-time Analytics)  is found on twtech Blog at the following link

https://think-with-tech.blogspot.com/2025/08/amazon-athena-deep-dive.html

NB:

    • Amazon Athena is a serverless, interactive query service that lets twtech run SQL queries directly on data stored in Amazon S3

Set up a query location in Amazon s3 by editing the settings

Create Location for S3 bucket to Store Queries from S3 console: twtech-athena-bucket


Create S3 bucket: twtech-athena-bucket

Acess the bucket Properties and copy the ARN: Select the bucket and click open

Navigate to Property tab and copy the ARN: arn:aws: s3:::twtech-athena

Return to Athena consule to : Manage settings

    • Sample Query location:    s3://twtech-athena-bucket/athena

Save Athena settings:

Setp-13:

    • twtech needs to create a database on Athena:
    • google : aws vpc flow logs Athena

    • This is a documentation guide to create a table to query data from s3: twtech-athena-bucket

Link:

https://docs.aws.amazon.com/athena/latest/ug/vpc-flow-logs-create-table-statement.html

    •  twtech needs to enter a DDL (Data Definition Language) statement like the following into the Athena console query editor, following the guidelines in the Considerations and limitations section.
    •  The sample statement creates a table that has the columns for Amazon VPC flow logs versions 2 through 5 as documented in Flow log records.
    •  If twtech uses a different set of columns or order of columns, then, there is need to modify the statement accordingly.

  • # Copy the syntax and use to create Athena VPC flow logs table:

# syntax (statement) –create-table-in-athena

CREATE EXTERNAL TABLE IF NOT EXISTS `vpc_flow_logs` (

  version int,

  account_id string,

  interface_id string,

  srcaddr string,

  dstaddr string,

  srcport int,

  dstport int,

  protocol bigint,

  packets bigint,

  bytes bigint,

  start bigint,

  `end` bigint,

  action string,

  log_status string,

  vpc_id string,

  subnet_id string,

  instance_id string,

  tcp_flags int,

  type string,

  pkt_srcaddr string,

  pkt_dstaddr string,

  region string,

  az_id string,

  sublocation_type string,

  sublocation_id string,

  pkt_src_aws_service string,

  pkt_dst_aws_service string,

  flow_direction string,

  traffic_path int

)

PARTITIONED BY (`date` date)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

LOCATION

s3://twtechvpc-flow-logs/AWSLogs/accountID/vpcflowlogs/us-east-2/2025/11/11/'

TBLPROPERTIES ("skip.header.line.count"="1");

# Reture to Athena UI: Paste the Syntax to create table

  • twtech needs Specify where the data need to be queried on the synthax:

From:

  • To specified location VPC Flow logs bucket (S3 bucket), twtech needs to go to bucket console and copy the: VPC flow-Logs S3 URI

  • twtech needs Continue clicking on the bucket to it get to the bucket properties (VPC-flow-logs): S3URI

  • Copy the S3URI and insert in the syntax to create table:

Sample S3 URI:

s3://twtechvpc-flow-logs/AWSLogs/accountID/vpcflowlogs/us-east-2/2025/11/11/

Next: run the syntax (statement)

Query: Successful

twtech Explanation:

    •  The query completed successfully and a  table with partitions: vpc_flow_logs

How twtech accesses the table partitions:

Again twtech needs to run another Syntax (stateme): 

    •  To create partitions to be able to read the data, as in the following sample query.
    •  This query creates a single partition for a specified date. Replace the placeholders for date and location as needed.

NB:

  • Syntax can still be got from aws official documentation link:
https://docs.aws.amazon.com/athena/latest/ug/vpc-flow-logs-create-table-statement.html

  • Copy syntax (statement) and insert the vpc-flow-log-bucket URI again:  S3 URI

NB:

The syntax has adjustments for:

From:

ALTER TABLE vpc_flow_logs

ADD PARTITION (`date`='YYYY-MM-dd')

LOCATION 's3://amzn-s3-demo-bucket/prefix/AWSLogs/{account_id}/vpcflowlogs/{region_code}/YYYY/MM/dd';

  • twtech needs to Continue clicking on the bucket to it get to the bucket properties again (VPC-flow-logs): S3URI

To: Configured

# syntax (statement)-create-partion-in-table

ALTER TABLE vpc_flow_logs

ADD PARTITION (`date`='2025-11-11')

LOCATION 's3://twtechvpc-flow-logs/AWSLogs/accountID/vpcflowlogs/us-east-2/2025/11/11/';

Run the syntax (statement) as well:

Query: Successful added one Partition into the table

From the same documentation link:

https://docs.aws.amazon.com/athena/latest/ug/vpc-flow-logs-create-table-statement.html

    • twtech can query with the following syntax (statement): To lists all of the rejected TCP connections (traffic) and uses the newly created date partition column, date, to extract from it the day of the week for which these events occurred.

# syntax (statement)- lists all of the rejected TCP connections (traffic)

SELECT day_of_week(date) AS

  day,

  date,

  interface_id,

  srcaddr,

  action,

  protocol

FROM vpc_flow_logs

WHERE action = 'REJECT' AND protocol = 6

LIMIT 100;

Copy the syntax (statement) and run in Athena:

Query: Successful

twtech Explanation:   

The Query to list all rejected traffic is a security strategy to monitor the vpc for the following

    • who is trying to access twtechVPC  most,
    • Table includes:

o   Day,

o   Date,

o   Interface_id,

o   Srcaddr (source_address),

o   Action (taken by twtechvpc),

o   Protocol

Mitigation Strategy from Athena (real-time data Querying):

    • twtech can block the IP addresses attacking its VPC (twtechVPC).







No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...