Friday, September 19, 2025

CloudWatch Unified Agent (For logs & Metrics) | Overview.


Amazon CloudWatch Unified Agent (For logs & Metrics) - Overview.

Scope:

  • Intro,
  • Key features and benefits include,
  • Link to official documentation,     
  • Overview of CloudWatch Unified Agent,
  • The Concept: Metrics Collected by the Unified Agent,
  • Metric Dimensions & Granularity,
  • Unified Agent vs Legacy Agents,
  • Sample Metrics JSON Config (snippet to enable memory disk metrics)
  • Advanced Use Cases,
  • Things to Watch Out For.

Intro:

    • The Amazon CloudWatch agent is a single, unified solution used to collect both system metrics and application logs from EC2 instances, on-premises servers, and containerized applications. 
    • The Amazon CloudWatch agent leverages the open-source Telegraf agent for metrics collection and a dedicated log agent component for logs. 
Key features and benefits include:
    • Unified Configuration: The agent uses a single configuration file (a JSON file) that dictates which metrics to capture (e.g., CPU utilization, memory, disk usage) and which log files to tail and send to CloudWatch Logs.
    • Reduced Overhead: By consolidating two functions into one agent, it simplifies installation, management, and resource overhead compared to running separate legacy agents (the traditional CloudWatch Logs agent and the EC2 monitoring scripts).
    • Custom Metrics: It allows users to collect standard system metrics as well as highly customizable metrics, including detailed memory and disk space utilization, which are not available by default in the basic EC2 host metrics.
    • Cross-Platform Support: The agent is compatible with various operating systems, including Amazon Linux, RHEL, Ubuntu, and Windows Server. 
NB:
  • twtech can install and configure this agent using the AWS Management Console, command line interface (CLI), or through services like AWS Systems Manager
Link to official documentation:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/create-cloudwatch-agent-configuration-file-wizard.html

Overview: CloudWatch Unified Agent

The CloudWatch Agent (often called the Unified Agent) is the next-gen replacement for Both:

    • CloudWatch Logs Agent (for log collection)
    • CloudWatch Metrics Agent (for basic system metrics)
    • Thus, the term Unified Agent.

It unifies both into a single agent that can:

    1. Collect system-level metrics (CPU, memory, disk, network, processes)
    2. Collect custom metrics (application-specific, via StatsD/collectd integration)
    3. Collect & ship logs to CloudWatch Logs
    4. Support advanced dimensions & structured metrics beyond the legacy defaults

 The Concept: Metrics Collected by the Unified Agent

1. System-Level Metrics

By default (when enabled), the agent collects OS-level metrics with 1-minute granularity:

  • CPU
    • cpu_time_active, 
    • cpu_time_guest, 
    • cpu_time_idle, 
    • cpu_time_nice, 
    •  cpu_time_softirq, 
    • cpu_time_steal, 
    • cpu_time_system, 
    • cpu_time_user, 
    • cpu_usage_active, etc.
  • Memory
    • mem_used_percent, 
    • mem_free, 
    • mem_total, 
    • mem_cached, 
    • mem_available, etc.
  • Disk
    • disk_used_percent, 
    • disk_free, 
    • disk_total, 
    • disk_inodes_free, etc.
  • Disk I/O
    • diskio_reads, 
    • diskio_writes, 
    • diskio_read_bytes, 
    • diskio_write_bytes, 
    •  diskio_read_time, 
    • diskio_write_time
  • Network
    • net_bytes_recv, 
    • net_bytes_sent, 
    • net_packets_recv, 
    • net_packets_sent, 
    •  net_err_in, 
    • net_err_out, etc.
  • Processes
    • procstat_cpu_usage, 
    • procstat_memory_usage, 
    • procstat_threads, etc. (configurable by process name or PID...ProcessID)

2. Custom Metrics

The Unified Agent integrates with:

    • StatsD – send metrics via UDP/TCP for real-time apps
    • collectd – supports 100+ collectd plugins for deep system and app monitoring

3. Logs

    • Collect application/system logs and send them to CloudWatch Logs
    • Support for multi-line parsing, filtering, and structured JSON logs

 Metric Dimensions & Granularity

    • Granularity: 1-second (for high-res metrics) or 1-minute (standard)
    • Dimensions (tags on metrics):
      • InstanceId
      • ImageId
      • InstanceType
      • Custom dimensions (e.g., Environment=Prod, Application=MyApp)
    • twtech can
    • twtech can aggregate or filter by dimensions for dashboards and alarms.

 Unified Agent vs Legacy Agents

Feature

Legacy CloudWatch Agent

Unified (New) CloudWatch Agent

CPU, disk, memory metrics

Limited (basic CPU, disk)

Full OS metrics, high- granularity

Log collection

Yes, but separate agent

Built-in, unified with metrics

Custom metrics

Manual push via SDK/CLI

Native support via StatsD, collectd

Config management

Manual JSON edits

SSM Parameter Store integration

Dimensions

Limited

Rich dimensions + custom tags

Platform support

Linux only (legacy logs agent)

Linux & Windows

Sample Metrics JSON Config (snippet to enable memory & disk metrics):

{

  "metrics": {

    "append_dimensions": {

      "InstanceId": "${aws:InstanceId}"

    },

    "metrics_collected": {

      "mem": {

        "measurement": [

          "mem_used_percent",

          "mem_available"

        ],

        "metrics_collection_interval": 60

      },

      "disk": {

        "measurement": [

          "disk_used_percent",

          "disk_free"

        ],

        "resources": [

          "/"

        ],

        "metrics_collection_interval": 60

      }

    }

  }

}

 Advanced Use Cases

    • High-Resolution Metrics (1-second granularity) for critical workloads
    • Per-Process Monitoring (e.g., track memory leaks in a specific Java process)
    • Custom App Metrics via StatsD (e.g., queue depth, request latency)
    • Centralized Config with SSM Parameter Store for agent configs across fleets
    • Log Enrichment: Add instance tags as dimensions to log streams

 Things to Watch Out For

    • Cost: More metrics at high-res = higher CloudWatch bill.
    • Agent Overhead: Too many process checks or plugins CPU usage by agent itself.
    • Permissions: Requires IAM role with cloudwatch:PutMetricData and logs:PutLogEvents.
    • Data Gaps: If the agent stops or misconfigured, metrics/logs stop flowing.



No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...