Amazon CloudWatch Composite Alarms - Overview.
Scope:
- Intro,
- Key Features and Benefits,
- Creating a Composite Alarm,
- Link to official documentation,
- The Concpt: Composite Alarms,
- Key Benefits,
- Architecture & Flow,
- Composite Alarm States,
- Alarm Actions with Composite Alarms,
- Advanced Use Cases,
- Best Practices,
- Final Thoughts.
Intro:
- Amazon CloudWatch composite alarmsaggregate the states of multiple individual alarms into a single, high-level alarm, which helps to reduce notification noise
- Amazon CloudWatch composite alarmsaggregate provides a single indicator for the overall health of an application or group of resources.
- Noise Reduction: Instead of receiving multiple notifications for related individual resource issues (e.g., high CPU, low memory, and high I/O for the same instance), twtech receives one alert from the composite alarm, which only triggers if the combination indicates a true problem.
- Logical Expressions: Composite alarms use a rule expression with Boolean operators (
AND,OR,NOT) to combine the states of underlying metric alarms or other composite alarms. - For example, an alarm could trigger only if
(CPUUtilizationTooHigh AND RAMUtilizationTooHigh). - Application Health at a Glance: They allow IT operations teams to monitor the health of an entire application or system through a single, top-level alarm on a dashboard, improving monitoring efficiency.
- Flexible Alerting: They support a variety of actions when the alarm state changes, including publishing to Amazon SNS topics, invoking AWS Lambda functions, and creating OpsItems in Systems Manager Ops Center.
- Threshold Functions: Advanced functions like
AT_LEASTallow for alerting when a specific number or percentage of resources are impacted, rather than requiring individual alarms for every resource.
- twtech can create a composite alarm using the AWS Management Console, AWS CLI, AWS SDKs, or AWS CloudFormation.
- Create Underlying Alarms: First, create the individual metric alarms that twtech wants to combine. These alarms typically do not have actions configured, as the composite alarm manages the notification or action.
- Define the Composite Alarm Rule: In the CloudWatch console, twtech defines the logical expression that determines when the composite alarm transitions to an
ALARMstate. - Configure Actions: Specify the actions twtech wants the composite alarm to take (e.g., sending a notification to an Amazon SNS topic).
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Create_Composite_Alarm.html
1. The Concpt: Composite
Alarms
- Composite Alarms let twtech combine multiple CloudWatch alarms into a single alarm using Boolean logic (AND, OR).
- Composite Alarms are built to reduce alert fatigue by ensuring twtech only get notified when a set of conditions is truly meaningful.
- CPU
utilization > 80% (Alarm A)
- AND
- Disk read latency > 50 ms (Alarm B)
NB:
- Composite Alarm (A AND B) triggers only when both conditions are met.
2. Key Benefits
- Noise Reduction →
Avoid “flapping” alerts from individual alarms.
- Contextual Alerts → Trigger only when multiple related symptoms occur.
- Simplified Monitoring → Group alarms into logical incidents.
- Cost Efficient → Fewer false positives → fewer automated recovery/scaling triggers.
3. Architecture & Flow
- Metrics collected (EC2 CPU, RDS Latency, API
Error Rates, etc.).
- Standard alarms created (Alarm A, Alarm B, etc.).
- Composite alarm built using
Boolean expression:
- (AlarmA
AND AlarmB)
- (AlarmA
OR AlarmC)
- NOT
AlarmB
- Composite alarm state change triggers an action:
- Notify
team (SNS, Slack,
PagerDuty).
- Initiate
Systems Manager runbook.
- Trigger
Auto Scaling or recovery.
4. Composite Alarm States
- ALARM
→ Boolean condition is satisfied.
- OK → Boolean condition is not satisfied.
- INSUFFICIENT_DATA → One or more child alarms don’t have enough data.
5. Alarm Actions with Composite Alarms
- Contrary to standard alarms, composite alarms can only send notifications.
- Composite alarms cannot directly stop/terminate/reboot EC2 or invoke Auto Scaling.
- Instead, Composite alarms are sent to SNS/EventBridge, which then triggers actions downstream.
6. Advanced Use Cases
- Multi-metric Health Checks
- Example:
Trigger alarm only if both CPU
> 90% AND Memory > 85% to avoid false positives.
- Service-Level Incidents
- Example:
Alarm when 5xx error rate > 5%
OR Latency > 3s across multiple APIs.
- Cross-Resource Correlation
- Example:
Alarm if RDS Latency > 100ms
AND EC2 CPU > 80% → indicates a bottleneck.
- Anomaly Detection + Static Thresholds
- Combine
anomaly detection alarms with static thresholds for smarter monitoring.
- Critical Escalations
- Use
Composite Alarms to trigger incident
management workflows only if multiple failure signals are
detected.
7. Best Practices
- Use Composite
Alarms for alert correlation, not as a replacement for
standard alarms.
- Apply naming
conventions (e.g., Composite/Prod/AppX/HealthCritical).
- Group alarms into dashboards
for
clear visibility.
- Keep SNS topics
dedicated (e.g., CompositeCriticalAlerts vs StandardAlerts).
- Always test Boolean logic thoroughly (A AND
(B OR C) scenarios).
- Composite Alarms are twtech alert noise filter.
- Composite Alarms help ensure that twtech teams only respond when conditions are:
- Truly meaningful,
- Making monitoring smarter
- Reducing unnecessary wake-up calls.
No comments:
Post a Comment