Amazon CloudWatch Unified Agent (For logs & Metrics) - Overview.
Scope:
- Intro,
- Key features and benefits include,
- Link to official documentation,
- Overview of CloudWatch Unified Agent,
- The Concept: Metrics Collected by the Unified Agent,
- Metric Dimensions &
Granularity,
- Unified Agent vs Legacy Agents,
- Sample Metrics JSON Config (snippet to enable memory & disk metrics)
- Advanced Use Cases,
- Things to Watch Out For.
Intro:
- The Amazon CloudWatch agent is a single, unified solution used to collect both system metrics and application logs from EC2 instances, on-premises servers, and containerized applications.
- The Amazon CloudWatch agent leverages the open-source Telegraf agent for metrics collection and a dedicated log agent component for logs.
- Unified Configuration: The agent uses a single configuration file (a JSON file) that dictates which metrics to capture (e.g., CPU utilization, memory, disk usage) and which log files to tail and send to CloudWatch Logs.
- Reduced Overhead: By consolidating two functions into one agent, it simplifies installation, management, and resource overhead compared to running separate legacy agents (the traditional CloudWatch Logs agent and the EC2 monitoring scripts).
- Custom Metrics: It allows users to collect standard system metrics as well as highly customizable metrics, including detailed memory and disk space utilization, which are not available by default in the basic EC2 host metrics.
- Cross-Platform Support: The agent is compatible with various operating systems, including Amazon Linux, RHEL, Ubuntu, and Windows Server.
- twtech can install and configure this agent using the AWS Management Console, command line interface (CLI), or through services like AWS Systems Manager
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/create-cloudwatch-agent-configuration-file-wizard.html
Overview:
CloudWatch Unified Agent
The CloudWatch
Agent (often called the Unified Agent) is the next-gen
replacement for Both:
- CloudWatch Logs Agent (for log
collection)
- CloudWatch Metrics Agent (for basic system metrics)
- Thus, the term Unified Agent.
It unifies both into a single agent that can:
- Collect system-level
metrics (CPU,
memory, disk, network, processes)
- Collect custom metrics (application-specific, via StatsD/collectd integration)
- Collect & ship logs to CloudWatch Logs
- Support advanced dimensions & structured metrics beyond the legacy defaults
The Concept: Metrics
Collected by the Unified Agent
1.
System-Level Metrics
By default (when enabled), the agent
collects OS-level metrics with 1-minute granularity:
- CPU
- cpu_time_active,
- cpu_time_guest,
- cpu_time_idle,
- cpu_time_nice,
- cpu_time_softirq,
- cpu_time_steal,
- cpu_time_system,
- cpu_time_user,
- cpu_usage_active,
etc.
- Memory
- mem_used_percent,
- mem_free,
- mem_total,
- mem_cached,
- mem_available,
etc.
- Disk
- disk_used_percent,
- disk_free,
- disk_total,
- disk_inodes_free,
etc.
- Disk I/O
- diskio_reads,
- diskio_writes,
- diskio_read_bytes,
- diskio_write_bytes,
- diskio_read_time,
- diskio_write_time
- Network
- net_bytes_recv,
- net_bytes_sent,
- net_packets_recv,
- net_packets_sent,
- net_err_in,
- net_err_out, etc.
- Processes
- procstat_cpu_usage,
- procstat_memory_usage,
- procstat_threads,
etc. (configurable by process name or PID...ProcessID)
2.
Custom Metrics
The Unified Agent integrates
with:
- StatsD – send metrics via UDP/TCP for real-time
apps
- collectd – supports 100+ collectd plugins for deep system and app monitoring
3.
Logs
- Collect application/system logs and send
them to CloudWatch Logs
- Support for multi-line parsing, filtering, and structured JSON logs
Metric Dimensions
& Granularity
- Granularity: 1-second (for high-res metrics) or 1-minute
(standard)
- Dimensions (tags on metrics):
- InstanceId
- ImageId
- InstanceType
- Custom dimensions (e.g., Environment=Prod, Application=MyApp)
- twtech can
- twtech can aggregate or filter by dimensions for dashboards and alarms.
Unified Agent vs
Legacy Agents
|
Feature |
Legacy
CloudWatch Agent |
Unified
(New) CloudWatch Agent |
|
|
CPU,
disk, memory metrics |
Limited (basic CPU, disk) |
Full OS metrics, high- granularity |
|
|
Log
collection |
Yes, but separate agent |
Built-in, unified with metrics |
|
|
Custom
metrics |
Manual push via SDK/CLI |
Native support via StatsD, collectd |
|
|
Config
management |
Manual JSON edits |
SSM Parameter Store integration |
|
|
Dimensions |
Limited |
Rich dimensions + custom tags |
|
|
Platform
support |
Linux only (legacy logs agent) |
Linux & Windows |
|
Sample Metrics JSON Config (snippet to enable memory & disk metrics):
{
"metrics": {
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"mem": {
"measurement": [
"mem_used_percent",
"mem_available"
],
"metrics_collection_interval":
60
},
"disk": {
"measurement": [
"disk_used_percent",
"disk_free"
],
"resources": [
"/"
],
"metrics_collection_interval":
60
}
}
}
}
Advanced Use Cases
- High-Resolution Metrics (1-second
granularity) for critical workloads
- Per-Process Monitoring (e.g., track memory leaks in a specific Java process)
- Custom App Metrics via StatsD (e.g., queue depth, request latency)
- Centralized Config with SSM Parameter Store for agent configs across fleets
- Log Enrichment: Add instance tags as dimensions to log streams
Things to Watch Out
For
- Cost: More metrics at high-res = higher
CloudWatch bill.
- Agent Overhead: Too many process checks or plugins → CPU usage by agent itself.
- Permissions: Requires IAM role with cloudwatch:PutMetricData and logs:PutLogEvents.
- Data Gaps: If the agent stops or misconfigured, metrics/logs stop flowing.
No comments:
Post a Comment