Thursday, December 25, 2025

AWS Trusted Advisor (TA) | Deep Dive & Hands-On.

An Overview of AWS Trusted Advisor (TA).

Focus:

Framed for DevOps / Cloud / SRE
Aligned with Well-Architected thinking.

Breakdown:

Intro,
Key Features and Functionality,
Integration with Other Services,
Accessing Trusted Advisor,
The concept: AWS Trusted Advisor,
Trusted Advisor vs Well-Architected Tool (Quick Context)
The 5 Trusted Advisor Check Categories (Deep Dive)
Cost Optimization,
Security,
Fault Tolerance (Reliability),
Performance,
Fault Tolerance (Reliability),
Service Limits (Quotas),
Support Plan Impact (Very Important),
Automation & Integrations (Where TA Shines),
Sample DevOps Automation Flow,
Trusted Advisor vs Third-Party Tools,
When Should twtech Rely on Trusted Advisor,
When Should twtech NOT Rely on Trusted Advisor,
twtech Recommendation (DevOps/SRE Playbook),
Insights.

Intro:

AWS Trusted Advisor is a web service that inspects twtech AWS environment and provides real-time recommendations based on best practices across six categories:

cost optimization,
performance,
resilience,
security,
operational excellence,
and service limits.

Trusted Advisor draws upon best practices learned from serving hundreds of thousands of AWS customers to identify opportunities to save money, improve availability and performance, and help close security gaps.

Key Features and Functionality

Best Practice Checks:

Trusted Advisor continuously evaluates your AWS environment using a set of automated checks and then recommends actions to remediate any deviations from best practices.

Support Plan Integration:

The number of checks available depends on your AWS Support plan.
Basic and Developer Support plans have access to all service limits checks and selected security and fault tolerance checks.
Business, Enterprise On-Ramp, and Enterprise Support plans have access to the full suite of checks.

Organizational View:

For organizations using AWS Organizations, the organizational view provides a consolidated view of Trusted Advisor recommendations across all accounts.

Integration with Other Services:

AWS Support API:

Allows programmatic access to check results.

Amazon CloudWatch and EventBridge:

twtech can create alarms and rules to monitor Trusted Advisor metrics and check status changes.

AWS Well-Architected Tool:

Integrates with the tool to evaluate workloads and provide data-driven insights.

AWS Config:

Many new checks are powered by AWS Config managed rules, enhancing the monitoring of operational excellence.

Prioritized Recommendations:

Available to Enterprise Support customers, Trusted Advisor Priority highlights the most critical recommendations, often including context-driven insights from your AWS account team.

Accessing Trusted Advisor

Access the Trusted Advisor console from the AWS Management Console at https://console.aws.amazon.com/trustedadvisor/home.
Permissions are managed through AWS Identity and Access Management (IAM) policies.

The concept: AWS Trusted Advisor

AWS Trusted Advisor is a real-time advisory service that continuously evaluates twtech AWS environment against AWS best practices and surfaces actionable recommendations across five domains:

1.     Cost Optimization
2.     Security
3.     Fault Tolerance
4.     Performance
5.     Service Limits

Think of AWS Trusted Advisor (TA) as:

A continuously running automated cloud review engine ...not a one-time audit.

Trusted Advisor vs Well-Architected Tool (Quick Context)

Trusted Advisor	Well-Architected Tool
Continuous	Point-in-time review
Automated checks	Architect-led assessment
Resource-level findings	Design-level questions
Ops-focused	Architecture-focused

In practice:

Trusted Advisor → day-to-day hygiene
Well-Architected → quarterly / major-change reviews

The 5 Trusted Advisor Check Categories (Deep Dive)

1. Cost Optimization

Identifies waste, overprovisioning, and idle resources.

Common Checks

Idle EC2 instances
Underutilized EBS volumes
Idle Load Balancers
Low-utilization RDS
Unassociated Elastic IPs
Reserved Instance & Savings Plan optimization

Sample Finding

12 EBS volumes unattached for over 30 days — estimated monthly savings: $480”

DevOps Best Practice

Integrate TA cost checks into FinOps dashboards
Use tagging enforcement + TA findings to assign ownership
Auto-remediate using Lambda where safe

2. Security

Maps closely to CIS benchmarks and AWS security best practices.

Common Checks

S3 buckets with public access
Security groups allowing 0.0.0.0/0 on sensitive ports
IAM users with:

No MFA
Unused access keys
Passwords older than policy

Root account without MFA
Exposed RDS snapshots

Sample Finding

“Security Group sg-xxxx allows SSH from 0.0.0.0/0”

DevSecOps Tie-in

Treat TA findings as security debt
Send findings to:

Security Hub
Jira / ServiceNow
SIEM (via EventBridge)

3. Fault Tolerance (Reliability)

Focuses on resilience and availability.

Common Checks

EC2 instances without EBS-backed volumes
Single-AZ RDS databases
ELBs without multiple targets
Auto Scaling groups without health checks
Missing backups

Sample Finding

“RDS instance is running in a single Availability Zone”

SRE Angle

TA highlights fragile infrastructure
Use it to prioritize:

Multi-AZ
Auto Scaling
Backup policies

4. Performance

Ensures services are appropriately sized and configured.

Common Checks

EC2 instances with high CPU or memory pressure
Classic Load Balancer usage (legacy)
CloudFront configuration inefficiencies
Suboptimal EBS volume types

Sample Finding

“Instance t3.micro experiencing sustained CPU throttling”

Platform Engineering Use

Feed TA signals into:

Capacity planning
Instance family modernization
Graviton adoption programs

5. Service Limits (Quotas)

Prevents scaling failures caused by quota exhaustion.

Common Checks

EC2 instance limits
VPC limits
EIP limits
Load balancer limits
Lambda concurrency limits

Sample Finding

“EC2 On-Demand instance usage at 85% of quota”

Ops Impact

One of the highest-value checks
Prevents:

Failed deployments
Incident escalations

Should be monitored like alerts

Support Plan Impact (Very Important)

Support Plan	Checks Available
Basic / Developer	Limited checks only
Business	Full Trusted Advisor
Enterprise	Full + prioritized support

NB:

Full value requires Business or Enterprise support

Automation & Integrations (Where TA Shines)

Event-Driven Ops

TA publishes findings to Amazon EventBridge
Enables:

Auto-ticket creation
Slack notifications
Auto-remediation

Security Hub

TA security checks can flow into AWS Security Hub
Unified security posture view

API & CLI

Query findings programmatically
Build custom dashboards

Sample DevOps Automation Flow

Trusted Advisor vs Third-Party Tools

TA is:

Native
Free with support
Low false positives

But:

Not deeply customizable
Doesn’t replace:

CSPM tools
Advanced cost optimization platforms

NB:

Best used as a baseline control plane.

When Should twtech Rely on Trusted Advisor

✔ Daily operational hygiene
✔ Security posture monitoring
✔ Cost waste detection
✔ Pre-incident prevention
✔ Leadership dashboards

❌ When Should twtech NOT Rely on Trusted Advisor

Application logic issues
Custom compliance frameworks
Deep performance profiling

twtech Recommendation (DevOps/SRE Playbook)

1. Enable full TA (Business Support)

2. Export findings via EventBridge

3. Classify findings:

Auto-fix
Ticket
Ignore (with justification)

4. Review trends monthly

5. Map findings to Well-Architected Pillars

Insight:

Trusted Advisor Review — EKS-Based SaaS (Production)

A realistic, end-to-end Trusted Advisor (TA)
Review for a production EKS-based SaaS, & serverless infrastructure.
Talored for DevOps / SRE / Platform lead.

Scenario

Multi-tenant SaaS
Amazon EKS (managed node groups + Fargate)
ALB Ingress Controller
RDS Aurora (Multi-AZ)
S3 + CloudFront
CI/CD via GitHub Actions
Business Support enabled

Step 1: Open Trusted Advisor (What twtech Actually See)

In the AWS Console:

Support → Trusted Advisor → Dashboard

twtech see:

Overall check summary
Counts per category
Red / Yellow / Green indicators

Sample snapshot:

Category	Status
Cost Optimization	🔴 8
Security	🔴 3
Fault Tolerance	🟡 5
Performance	🟢 1
Service Limits	🟡 2

Step 2: Cost Optimization Findings (EKS Reality)

Finding 1: Underutilized EC2 Instances (Worker Nodes)

TA Output

“5 EC2 instances with average CPU utilization below 10% over 14 days”

Why This Happens in EKS

Static node groups
Poor pod bin-packing
No Cluster Autoscaler or misconfigured limits

Action

Enable Cluster Autoscaler
Right-size node groups
Use multiple instance types
Add pod requests/limits

SRE Note

This is a platform problem, not an app problem.

Finding 2: Idle Load Balancer

TA Output

“1 Application Load Balancer with no active targets”

Root Cause

Old Ingress left behind
Blue/green deployment cleanup failure

Action

Validate Ingress ownership via tags
Delete unused ALB
Add CI/CD cleanup checks

Finding 3: Unattached EBS Volumes

TA Output

“12 EBS volumes unattached for 30+ days”

Common EKS Cause

PVC deleted
Volume left behind due to reclaim policy

Action

Audit Retain vs Delete
Use CSI driver lifecycle policies

Step 3: Security Findings (High Signal)

Finding 4: Security Group Allows 0.0.0.0/0 on Port 443

TA Output

“Security group allows unrestricted access”

Reality Check

ALB SG intentionally public
But backend node SG also exposed ❌

Action

ALB SG → 0.0.0.0/0
Node SG → ALB SG only
Lock down NodePort ranges

Finding 5: IAM User Without MFA

TA Output

“IAM user has console access without MFA”

Root Cause

Legacy CI user
Someone bypassed IAM roles

Action

Kill static users
Enforce:

IAM roles
OIDC (GitHub Actions / IRSA)

SCP: DenyWithoutMFA

DevSecOps Callout

This is a release-blocking issue in mature orgs.

Finding 6: S3 Bucket Allows Public Access

TA Output

“S3 bucket allows public access”

False Positive? Maybe.

Static assets behind CloudFront
But Block Public Access disabled ❌

Action

Enable Block Public Access
Use Origin Access Control (OAC)
Restrict bucket policy to CloudFront only

Step 4: Fault Tolerance (Where Incidents Are Born)

Finding 7: Auto Scaling Group in Single AZ

TA Output

“Auto Scaling group spans a single Availability Zone”

Impact

AZ outage = platform outage

Action

Spread node groups across ≥2 AZs
Verify pod anti-affinity rules

Finding 8: RDS Backup Retention Low

TA Output

“RDS backup retention less than 7 days”

Reality

· Dev/test DB accidentally promoted

Action

Enforce via:

AWS Config
Terraform guardrails

Step 5: Performance Findings

Finding 9: No Major Issues

Typical for EKS because TA doesn’t deeply inspect Kubernetes internals.

NB:

TA won’t see:

Pod CPU throttling
Memory OOMs
API server saturation

Use:

Prometheus
Karpenter metrics
CloudWatch Container Insights

Step 6: Service Limits (Critical but Ignored)

Finding 10: EC2 Instance Limit at 80%

TA Output

“EC2 On-Demand instance usage approaching limit”

Why This Matters

Scaling events will fail
Deployments stall during incidents

Action

Request quota increase
Migrate to:
Spot

Gravito

Fargate where possible

Step 7: Prioritization (How Pros Do It)

Severity	Action
Security	Fix immediately
Service Limits	Fix before next deploy
Cost	Schedule within sprint
Fault Tolerance	Roadmap
Performance	Monitor

Step 8: Automation Pipeline (Real World)

Auto-Fix Candidates

Unattached EBS
Idle ELBs
Unused EIPs
IAM access key rotation reminders

Serverless SaaS Differences (Quick Contrast)

Area	EKS	Serverless
Compute cost	Underutilized EC2	Lambda duration
Security	SG + IAM	IAM + resource policies
Fault tolerance	AZ spread	Mostly managed
Service limits	EC2, ENIs	Lambda concurrency

Common serverless TA findings:

Lambda concurrency limits
Public S3 buckets
Idle API Gateway stages
Underutilized provisioned concurrency

Final takeaway (What to Tell Organization Leadership)

Trusted Advisor identified 3 security risks, 2 scaling blockers, and ~$1,200/month in-waste.
All critical security findings (Concerns) were remediated within 24 hours.”

Links to useful resources:

https://aws.amazon.com/architecture/

https://aws.amazon.com/solutions/

Project: Hands-On

How twtech uses AWS Trusted Advisor in its Environment to:

Provides real-time recommendations based on best practices across six categories:

cost optimization,
performance,
resilience,
security,
operational excellence,
and service limits.

Draws upon best practices (learned from serving AWS customers) to identify opportunities that save money, improve availability, performance, and help close security gaps.
Login to aws account and use the link provided herein to reach AWS Service: AWS Trust Advisor (TA) https://console.aws.amazon.com/trustedadvisor/home.

Upgrade twtech AWS Support Plan to get all Trusted Advisor checks

NB:

Without Upgrade, twtech AWS Support Plan gets only Limited Trusted Advisor checks.

Service limits

twtech Chooses a check name to see recommendations for services that use more than 80 percent of a service quota.
The check results use values based on a snapshot, so twtech current usage might vary.
Quota and usage data can take up to 24 hours to reflect any chang.

NB:

twtech need to pay for a support plan when it Upgrades its AWS Support Plan to get all (full) Trusted Advisor checks.

Addendum:

Links to More Architecture Examples

Link to ASW Certification Solution Architect- Associcate (Exam)

https://aws.amazon.com/certification/certified-solutions-architect-associate/

Thursday, December 25, 2025

AWS Trusted Advisor (TA) | Deep Dive & Hands-On.

The concept: AWS Trusted Advisor

Trusted Advisor vs Well-Architected Tool (Quick Context)

The 5 Trusted Advisor Check Categories (Deep Dive)

1. Cost Optimization

Common Checks

Sample Finding

DevOps Best Practice

2. Security

Common Checks

Sample Finding

DevSecOps Tie-in

3. Fault Tolerance (Reliability)

Common Checks

Sample Finding

SRE Angle

4. Performance

Common Checks

Sample Finding

Platform Engineering Use

5. Service Limits (Quotas)

Common Checks

Sample Finding

Ops Impact

Support Plan Impact (Very Important)

Automation & Integrations (Where TA Shines)

Event-Driven Ops

Security Hub

API & CLI

Sample DevOps Automation Flow

Trusted Advisor vs Third-Party Tools

When Should twtech Rely on Trusted Advisor

❌ When Should twtech NOT Rely on Trusted Advisor

twtech Recommendation (DevOps/SRE Playbook)

Trusted Advisor Review — EKS-Based SaaS (Production)

Step 1: Open Trusted Advisor (What twtech Actually See)

Step 2: Cost Optimization Findings (EKS Reality)

Finding 1: Underutilized EC2 Instances (Worker Nodes)

Finding 2: Idle Load Balancer

Finding 3: Unattached EBS Volumes

Step 3: Security Findings (High Signal)

Finding 4: Security Group Allows 0.0.0.0/0 on Port 443

Finding 5: IAM User Without MFA

Finding 6: S3 Bucket Allows Public Access

Step 4: Fault Tolerance (Where Incidents Are Born)

Finding 7: Auto Scaling Group in Single AZ

Finding 8: RDS Backup Retention Low

Step 5: Performance Findings

Finding 9: No Major Issues

Step 6: Service Limits (Critical but Ignored)

Finding 10: EC2 Instance Limit at 80%

Step 7: Prioritization (How Pros Do It)

Step 8: Automation Pipeline (Real World)

Serverless SaaS Differences (Quick Contrast)

Final takeaway (What to Tell Organization Leadership)

Amazon EventBridge | Overview.