An Overview of AWS Trusted Advisor (TA).
Focus:
- Framed for DevOps /
Cloud / SRE
- Aligned with Well-Architected thinking.
Breakdown:
- Intro,
- Key Features and Functionality,
- Integration with Other Services,
- Accessing Trusted
Advisor,
- The concept: AWS Trusted Advisor,
- Trusted Advisor vs Well-Architected Tool (Quick Context)
- The 5 Trusted Advisor Check Categories (Deep Dive)
- Cost Optimization,
- Security,
- Fault Tolerance (Reliability),
- Performance,
- Fault Tolerance (Reliability),
- Service Limits (Quotas),
- Support Plan Impact (Very Important),
- Automation & Integrations (Where TA Shines),
- Sample DevOps Automation Flow,
- Trusted Advisor vs Third-Party Tools,
- When Should twtech Rely on Trusted Advisor,
- When Should twtech NOT Rely on Trusted Advisor,
- twtech Recommendation (DevOps/SRE Playbook),
- Insights.
Intro:
- AWS Trusted Advisor is a web service that
inspects twtech AWS environment and provides real-time
recommendations based on best practices across six categories:
- cost optimization,
- performance,
- resilience,
- security,
- operational excellence,
- and
service limits.
- Trusted
Advisor draws upon best practices learned from serving hundreds of thousands of
AWS customers to identify opportunities to save money, improve availability and
performance, and help close security gaps.
Key Features and
Functionality
Best Practice Checks:
-
Trusted Advisor
continuously evaluates your AWS environment using a set of automated checks and
then recommends actions to remediate any deviations from best practices.
Support Plan Integration:
- The number of checks
available depends on your AWS Support plan.
- Basic and Developer Support plans have access to all service
limits checks and selected security and fault tolerance checks.
- Business, Enterprise On-Ramp, and
Enterprise Support plans have access to the full suite of checks.
Organizational View:
-
For organizations using
AWS Organizations, the organizational view provides a consolidated view of
Trusted Advisor recommendations across all accounts.
Integration with Other Services:
AWS Support API:
- Allows programmatic access to check results.
Amazon CloudWatch and EventBridge:
- twtech can create
alarms and rules to monitor Trusted Advisor metrics and check status changes.
AWS Well-Architected Tool:
- Integrates with the tool to evaluate workloads
and provide data-driven insights.
AWS Config:
- Many new checks are powered by AWS Config
managed rules, enhancing the monitoring of operational excellence.
Prioritized Recommendations:
- Available to
Enterprise Support customers, Trusted Advisor Priority highlights the most
critical recommendations, often including context-driven insights from your AWS
account team.
Accessing Trusted Advisor
The concept: AWS Trusted Advisor
- AWS Trusted Advisor is a real-time advisory service
that continuously evaluates twtech AWS environment against AWS
best practices and surfaces actionable
recommendations across five domains:
1. Cost
Optimization
2. Security
3. Fault
Tolerance
4. Performance
5. Service
Limits
Think of AWS Trusted Advisor (TA) as:
A continuously running automated cloud review engine ...not a one-time audit.
Trusted Advisor vs Well-Architected Tool (Quick Context)
|
Trusted Advisor
|
Well-Architected Tool
|
|
Continuous
|
Point-in-time
review
|
|
Automated checks
|
Architect-led
assessment
|
|
Resource-level findings
|
Design-level
questions
|
|
Ops-focused
|
Architecture-focused
|
In practice:
- Trusted Advisor
→ day-to-day hygiene
- Well-Architected
→ quarterly / major-change reviews
The 5 Trusted Advisor Check Categories (Deep Dive)
1. Cost Optimization
- Identifies waste, overprovisioning, and idle
resources.
Common Checks
- Idle EC2 instances
- Underutilized EBS volumes
- Idle Load Balancers
- Low-utilization RDS
- Unassociated Elastic IPs
- Reserved Instance & Savings Plan
optimization
Sample Finding
- 12 EBS volumes unattached for over 30 days — estimated monthly savings:
$480”
DevOps Best Practice
- Integrate TA cost checks into FinOps
dashboards
- Use tagging enforcement
+ TA findings to assign ownership
- Auto-remediate using Lambda where safe
2. Security
- Maps closely to CIS benchmarks and AWS
security best practices.
Common Checks
- S3 buckets with public access
- Security groups allowing
0.0.0.0/0 on sensitive ports - IAM users with:
- No MFA
- Unused access keys
- Passwords older than policy
- Root account without MFA
- Exposed RDS snapshots
Sample Finding
- “Security Group sg-xxxx allows SSH from 0.0.0.0/0”
DevSecOps Tie-in
-
Treat TA findings as security
debt
- Send findings to:
- Security Hub
- Jira / ServiceNow
- SIEM (via EventBridge)
3. Fault Tolerance (Reliability)
- Focuses on resilience and availability.
Common Checks
- EC2 instances without EBS-backed volumes
- Single-AZ RDS databases
- ELBs without multiple targets
- Auto Scaling groups without health checks
- Missing backups
Sample Finding
- “RDS instance is running in a single Availability Zone”
SRE Angle
-
TA highlights fragile infrastructure
- Use it to prioritize:
- Multi-AZ
- Auto Scaling
- Backup policies
4. Performance
- Ensures services are appropriately sized and configured.
Common Checks
- EC2 instances with high CPU or memory pressure
- Classic Load Balancer usage (legacy)
- CloudFront configuration inefficiencies
- Suboptimal EBS volume types
Sample Finding
- “Instance t3.micro experiencing sustained CPU throttling”
Platform Engineering Use
- Feed TA signals into:
- Capacity planning
- Instance family modernization
- Graviton adoption programs
5. Service Limits (Quotas)
- Prevents scaling failures caused
by quota exhaustion.
Common Checks
- EC2 instance limits
- VPC limits
- EIP limits
- Load balancer limits
- Lambda concurrency limits
Sample Finding
- “EC2 On-Demand instance usage at 85% of quota”
Ops Impact
- One of the highest-value checks
- Prevents:
- Failed deployments
- Incident escalations
- Should be monitored like alerts
Support Plan Impact (Very Important)
|
Support Plan
|
Checks Available
|
|
Basic / Developer
|
Limited checks only
|
|
Business
|
Full Trusted Advisor
|
|
Enterprise
|
Full + prioritized support
|
NB:
- Full value requires
Business or Enterprise support
Automation & Integrations (Where TA Shines)
Event-Driven Ops
- TA publishes findings to Amazon
EventBridge
- Enables:
- Auto-ticket creation
- Slack notifications
- Auto-remediation
Security Hub
- TA security checks can flow into AWS
Security Hub
- Unified security posture view
API & CLI
- Query findings programmatically
- Build custom dashboards
Sample DevOps Automation Flow
Trusted Advisor vs Third-Party Tools
TA is:
- Native
- Free with support
- Low false positives
But:
- Not deeply customizable
- Doesn’t replace:
- CSPM tools
- Advanced cost optimization platforms
NB:
- Best used as a baseline control plane.
When Should twtech Rely on Trusted Advisor
✔
Daily operational hygiene
✔
Security posture monitoring
✔
Cost waste detection
✔
Pre-incident prevention
✔
Leadership dashboards
❌ When Should twtech NOT
Rely on Trusted Advisor
- Application logic issues
- Custom compliance frameworks
- Deep performance profiling
twtech Recommendation (DevOps/SRE Playbook)
1. Enable
full TA (Business Support)
2. Export
findings via EventBridge
3. Classify
findings:
- Auto-fix
- Ticket
- Ignore (with justification)
4. Review
trends monthly
5.
Map findings to Well-Architected
Pillars
Insight:
Trusted Advisor Review — EKS-Based SaaS (Production)
- A realistic, end-to-end Trusted Advisor (TA)
- Review
for a production EKS-based SaaS,
& serverless infrastructure.
- Talored for
DevOps / SRE / Platform lead.
Scenario
- Multi-tenant SaaS
- Amazon EKS (managed node groups + Fargate)
- ALB Ingress Controller
- RDS Aurora (Multi-AZ)
- S3 + CloudFront
- CI/CD via GitHub Actions
- Business Support enabled
Step 1: Open
Trusted Advisor (What twtech
Actually See)
In the AWS
Console:
Support → Trusted Advisor → Dashboard
twtech see:
- Overall check summary
- Counts per category
- Red / Yellow / Green indicators
Sample
snapshot:
|
Category
|
Status
|
|
Cost Optimization
|
🔴 8
|
|
Security
|
🔴 3
|
|
Fault Tolerance
|
🟡 5
|
|
Performance
|
🟢 1
|
|
Service Limits
|
🟡 2
|
Step 2: Cost
Optimization Findings (EKS
Reality)
Finding 1: Underutilized EC2 Instances (Worker Nodes)
TA Output
- “5 EC2 instances with average CPU utilization below 10% over 14 days”
Why This Happens in EKS
- Static node groups
- Poor pod bin-packing
- No Cluster Autoscaler or misconfigured limits
Action
- Enable Cluster Autoscaler
- Right-size node groups
- Use multiple instance types
- Add pod
requests/limits
SRE Note
- This is a platform problem, not an app problem.
Finding 2: Idle Load Balancer
TA Output
- “1 Application Load Balancer with no active targets”
Root Cause
- Old Ingress left behind
- Blue/green
deployment cleanup failure
Action
- Validate Ingress ownership via tags
- Delete unused ALB
- Add CI/CD cleanup checks
Finding 3: Unattached EBS Volumes
TA Output
- “12 EBS volumes unattached for 30+ days”
Common EKS Cause
- PVC deleted
- Volume left behind due to reclaim policy
Action
- Audit
Retain
vs Delete - Use CSI driver lifecycle policies
Step 3: Security
Findings (High Signal)
Finding 4: Security
Group Allows 0.0.0.0/0 on Port 443
TA Output
- “Security group allows unrestricted access”
Reality Check
- ALB SG intentionally public
- But backend node SG also exposed ❌
Action
- ALB SG →
0.0.0.0/0 - Node SG → ALB SG only
- Lock down NodePort ranges
Finding 5: IAM User Without MFA
TA Output
- “IAM user has console access without MFA”
Root Cause
- Legacy CI user
- Someone bypassed IAM roles
Action
- Kill static users
- Enforce:
- IAM roles
- OIDC (GitHub Actions / IRSA)
- SCP:
DenyWithoutMFA
DevSecOps Callout
- This is a release-blocking issue in mature
orgs.
Finding 6: S3 Bucket Allows Public Access
TA Output
- “S3 bucket allows public access”
False Positive? Maybe.
- Static assets behind CloudFront
- But Block Public Access disabled ❌
Action
- Enable Block Public Access
- Use Origin Access Control (OAC)
- Restrict bucket policy to CloudFront only
Step 4: Fault
Tolerance (Where
Incidents Are Born)
Finding 7: Auto Scaling Group in Single AZ
TA Output
- “Auto Scaling group spans a single Availability Zone”
Impact
- AZ outage = platform outage
Action
- Spread node groups across ≥2 AZs
- Verify pod anti-affinity rules
Finding 8: RDS Backup Retention Low
TA Output
- “RDS backup retention less than 7 days”
Reality
·
Dev/test DB accidentally promoted
Action
- Enforce via:
- AWS Config
- Terraform guardrails
Step 5: Performance
Findings
Finding 9: No Major Issues
- Typical for EKS because TA doesn’t deeply inspect Kubernetes internals.
NB:
- TA won’t see:
- Pod CPU throttling
- Memory OOMs
- API server saturation
Use:
- Prometheus
- Karpenter metrics
- CloudWatch Container Insights
Step 6: Service
Limits (Critical but
Ignored)
Finding 10: EC2 Instance Limit at 80%
TA Output
- “EC2 On-Demand instance usage approaching limit”
Why This Matters
- Scaling events will fail
- Deployments stall during incidents
Action
- Request quota increase
- Migrate to:
Step 7: Prioritization
(How Pros Do It)
|
Severity
|
Action
|
|
Security
|
Fix immediately
|
|
Service Limits
|
Fix before next
deploy
|
|
Cost
|
Schedule within
sprint
|
|
Fault Tolerance
|
Roadmap
|
|
Performance
|
Monitor
|
Step 8: Automation
Pipeline (Real World)
Auto-Fix Candidates
- Unattached EBS
- Idle ELBs
- Unused EIPs
- IAM access key rotation reminders
Serverless SaaS Differences (Quick Contrast)
|
Area
|
EKS
|
Serverless
|
|
Compute cost
|
Underutilized EC2
|
Lambda duration
|
|
Security
|
SG + IAM
|
IAM + resource policies
|
|
Fault tolerance
|
AZ spread
|
Mostly managed
|
|
Service limits
|
EC2, ENIs
|
Lambda concurrency
|
Common
serverless TA findings:
- Lambda concurrency limits
- Public S3 buckets
- Idle API Gateway stages
- Underutilized provisioned concurrency
Final takeaway (What to Tell Organization Leadership)
- Trusted Advisor identified 3 security risks, 2 scaling blockers, and ~$1,200/month in-waste.
- All critical security findings (Concerns) were remediated within 24 hours.”
Links to useful resources:
https://aws.amazon.com/architecture/
https://aws.amazon.com/solutions/
Project: Hands-On
How twtech uses AWS Trusted Advisor in its Environment to:
- Provides real-time recommendations based on best practices across six categories:
- cost optimization,
- performance,
- resilience,
- security,
- operational excellence,
- and service limits.
- Draws upon best practices (learned from serving AWS customers) to identify opportunities that save money, improve availability, performance, and help close security gaps.
- Login to aws account and use the link provided herein to reach AWS Service: AWS Trust Advisor (TA) https://console.aws.amazon.com/trustedadvisor/home.
- Upgrade
twtech AWS Support Plan to get all Trusted Advisor checks
NB:
- Without Upgrade, twtech AWS Support Plan gets only Limited
Trusted Advisor checks.
Service limits
- twtech
Chooses a check name to see recommendations for services that use more than 80
percent of a service quota.
- The
check results use values based on a snapshot, so twtech current usage might
vary.
- Quota
and usage data can take up to 24 hours to reflect any chang.

NB:
- twtech
need to pay for a support plan when it Upgrades its AWS
Support Plan to get all
(full)
Trusted Advisor
checks.
Addendum: Links to More Architecture Examples
Link to ASW Certification Solution Architect- Associcate (Exam)