Monday, September 22, 2025

CloudWatch Alarms | Overview & Hands-On.

Amazon CloudWatch Alarms - Overview & Hands-On.

Scope:

Intro,
Key Features and Types,
Configuration and Management,
Link to official documentation,
The Concept: CloudWatch Alarms,
Alarm States,
Architecture & Flow,
Key Features,
Alarm Actions,
Advanced Use Cases,
Final thoughts.
Hands-On.

Intro:

Amazon CloudWatch alarms are a feature of the Amazon CloudWatch monitoring service that watches a single metric, a metric math expression, or a composite of other alarms.
When the value of the monitored item exceeds a specified threshold for a certain number of time periods, the alarm changes state (e.g., from OK to ALARM) and can perform automated actions, such as sending notifications or stopping an EC2 instance.

Key Features and Types

Standard Alarms: These are based on a single metric, like CPU utilization or the number of Lambda function errors.
Metric Math Expressions: twtech can create alarms based on the results of mathematical expressions involving multiple metrics.
Composite Alarms: These trigger based on the states of multiple child alarms (Originating from initial Parent alarm), allowing for more complex alerting scenarios, such as alarming only when both CPU usage and disk space are high.
Anomaly Detection: CloudWatch can use machine learning to set dynamic, data-driven thresholds, alerting twtech when a metric's behavior is unusual rather than just exceeding a static value.
Log Alarming: twtech can define metric filters from log data and create alarms based on those filtered metrics.
Recommended Alarms: CloudWatch provides out-of-the-box alarm recommendations for various AWS services, helping twtech quickly set up essential monitoring.

Configuration and Management

States: Alarms can be in one of three states: OK, ALARM, or INSUFFICIENT_DATA.
Actions: When an alarm state changes, it can trigger actions such as:Sending notifications to an Amazon SNS topic.

Initiating Auto Scaling actions.
Stopping, terminating, rebooting, or recovering an Amazon EC2 instance.
Invoking an AWS Lambda function.
Sending events to Amazon EventBridge for more complex automations.

Missing Data Treatment: twtech can configure how the alarm treats missing data points (as good, bad, maintain current state, or ignore) to prevent false alarms or missed alerts on sparse metrics.
Management: Alarms can be managed through the CloudWatch console, the AWS CLI, the AWS SDKs, and AWS CloudFormation templates.

Link to official documentation:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html

1. The Concept: CloudWatch Alarms

CloudWatch Alarms monitor a single CloudWatch metric (or a math expression involving metrics) over a period of time and take action when the value breaches a defined threshold.
They are reactive automation triggers tied to metrics.

Metrics source: EC2, RDS, Lambda, S3, DynamoDB, Load Balancers, Custom Metrics, etc.
Actions on state change: SNS notifications, Auto Scaling, EC2 stop/terminate/reboot, Systems Manager OpsItems, EventBridge integration.

2. Alarm States

OK → Metric is within the defined threshold.
ALARM → Metric is outside the threshold.
INSUFFICIENT_DATA → Not enough data points yet (new metric, missing data, or insufficient samples).

NB:

twtech can configure how missing data should be treated (as good, bad, ignore, or missing).

3. Architecture & Flow

Metrics generated (e.g., CPUUtilization, Lambda Errors, API Gateway Latency).
Metrics stored in CloudWatch Logs/Metrics backend.
Alarms evaluate metrics periodically (based on period & evaluation windows).
Alarm transitions state → ALARM, OK, INSUFFICIENT_DATA.
Action executed (via SNS, Auto Scaling, Systems Manager, EventBridge, etc.).

4. Key Features

Standard & Composite Alarms

Standard Alarm: Evaluates a single metric or math expression.
Composite Alarm: Combines multiple alarms with Boolean logic (AND, OR) to reduce alert noise.

Anomaly Detection Alarms

Uses ML to automatically set dynamic thresholds instead of static ones.
Detects deviations from expected metric patterns.

Evaluation Periods & Datapoints

Example: “Trigger ALARM if CPUUtilization > 80% for 3 out of the last 5 minutes.”

Treating Missing Data

Prevents false positives/negatives by defining how missing metrics behave.

5. Alarm Actions

SNS Topic → Notifications

Email, SMS, Lambda, PagerDuty, Slack, etc.

EC2 Actions

Stop, terminate, reboot, or recover instances.

Auto Scaling Policies

Scale out/in based on metrics.

EventBridge

Trigger workflows, automation runbooks, ticketing systems.

Systems Manager OpsCenter

Create OpsItems for incident response.

6. Advanced Use Cases

Dynamic Scaling

Scale ECS/Fargate or EC2 fleets automatically using CPU, Memory, or custom app metrics.

Proactive Cost Control

Alarm on billing metrics (e.g., AWS/Billing > EstimatedCharges).

SLA (Service-Level-Management) Monitoring

Alarm on API latency or error rate.

Security Monitoring

Alarm on unusual IAM activity (via CloudTrail metrics in CloudWatch).

Health & Recovery

Auto-recover critical EC2 instances if system status check fails.

7. Best Practices

Use composite alarms to reduce noise.
Leverage anomaly detection instead of static thresholds where workloads are variable.
Group alarms into dashboards for observability.
Apply naming conventions (Prod/AppName/CPUHighAlarm) for clarity.
Use Tags to organize alarms by environment, team, or application.
Ensure alarm action permissions (SNS topics, EC2 actions) are properly configured.

Final thoughts:

CloudWatch Alarms are the automation backbone of monitoring in AWS.
CloudWatch Alarms transform raw metrics into actionable triggers, enabling proactive scaling, recovery, security, and incident response.

Project: Hands-On

How twtech creates CloudWatch Alarms, use it to monitor its resources (logs + metrics), and transform raw metrics into actionable triggers, enabling proactive scaling, recovery, security, and incident response.

Search for aws service: CloudWatch

Go to EC2 instance UI to: Create or restart an instance

keep the instance in a running state.

Create an alarm to monitor an EC2 instance (docker-trivy): The idea is to stop the instance if CUPUtilization is greater than the set-threshold at 90%

Specify metric and conditions

Select metric : with instanceID (copy with Ctrl+c)

Paste the instance id and press enter butter on keyboard to: search

Click on: EC2 > Per-Instance Metrics 17

Navigate through the 17 Metric to locate and select: CPUUtilization

Configure actions

Select Action: EC2 action

Add alarm details: Name and description

Alarm name: twtechDockerTrivyAlarm

Preview and create Alarm: twtechDockerTrivyAlarm

Conditions: CPUUtilization > 90

Configure actions: SNS topic & alarm detaits

Create alarm: twtechDockerTrivyAlarm

To see alarm created: CloudWatch Alarm/Alarms/All Alarms

Details of the Alarm: twtechDockerTrivyAlarm

How twtech uses the CLI command: “set-alarm-state” to test the configuration of the alarm created.

The idea is to test whether the alarm would trigger if the conditions set for the EC2 instance ( docker-trivy) prevailed (went above threshold > 90%)
Click on Coudshell to: Make API calls to docker-trivy instance (that CPUUtilization alarm is created to monitor)

Google Search for commands to use: aws ClouWatch set alarm state

Link to official documentation:

https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/set-alarm-state.html

Navigate to the section with the command: Copy, configure and run

aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value ALARM --state-reason "testing purposes"

# SampletwtechSynopsis:

aws CloudWatch set-alarm-state --alarm-name twtechDockerTrivyAlarm --state-value ALARM --state-reason twtech alarm testing purposes

Go back and refresh twtechDockerTrivyAlarm page in CloudWach: Navigate to Ac

From:

To:

Actions: Actions enabled

Type	Description	Config
Notification	When alarm transitions to in alarm, send message to topic "Default_CloudWatch_Alarms_Topic".	-
EC2	When alarm transitions to in alarm, stop the instance with id "i-02609a928xxx".	-

How twtech checks history of Alarm: History

Finally twtech needs to go EC2 instance UI, to verify if the alarm Action(stop instance) was applied to docker-trivy: when alarm was trigger.

Yes:

An SNS notification was sent to twtech default topic: twtech671@gmail.com
twtech also successfully created an Alarm for its instance (docker-trivy), configure the alarm that triggers and automatication stop the instance when CPUUtilization exceeded threshold value set.

Think - with -Tech

Monday, September 22, 2025

CloudWatch Alarms | Overview & Hands-On.

Add alarm details: Name and description

No comments:

Post a Comment

Databases Explained & Use Cases with (Flash Card) | Overview.

Blog Archive