Thursday, December 25, 2025

AWS Well-Architected Tool (WAT) | Overview & Hands-On.

An Overview of AWS Well-Architected Tool (WAT).

Focus:

  • Relevant to  DevOps / DevSecOps / Cloud Engineers.

Breakdown:

  •        Intro,
  •        Key Features and Functionality of WAT,
  •        The Six Pillars of AWS,
  •        The concept: AWS Well-Architected,
  •        Core Concepts twtech Must Understand,
  •        Review Process – Step by Step (Real-World),
  •        Pillar-Specific Deep Insights (Beyond Basics),
  •        Reports & Artifacts twtech Can Generate,
  •        Automation & Integration (Advanced Use),
  •        Anti-Patterns to Avoid,
  •        Best-Practice Cadence (What Teams Should Do),
  •        How This Helps in twtech Career Path,
  •        Insights.

Intro:

  •        The AWS Well-Architected Tool (WAT) is a service that helps twtech to review its workloads against current AWS architectural best practices and provides guidance for improvement / remediation.
  •        AWS Well-Architected Tool (WAT) service is based on the comprehensive AWS Well-Architected Framework, which consists of six pillars. 

Key Features and Functionality

Workload Reviews:

  •          The tool guides twtech through a set of questions aligned with the six pillars of the Framework to evaluate its architecture.

Risk Identification:

  •         After a review, twtech receives an improvement plan that highlights high-risk (HRI) and medium-risk (MRI) issues, along with recommended remediation steps.

Milestone Tracking:

  •         twtech can save the state of a workload at specific points in time as "milestones" to measure and track progress as its implement improvements.

Custom Lenses:

  •         The tool allows twtech to create and share custom lenses, which are extensions of the core framework tailored to specific industry or internal organizational best practices.

Integrations:

  •          The tool can integrate with other AWS services like AWS Trusted Advisor to automatically incorporate operational checks and provide a more comprehensive view of twtech environment.

Collaboration:

  •          Workload reviews can be shared across teams or with external AWS Partners to enhance collaboration and ensure consistent application of best practices.

The Six Pillars of AWS

  • The framework and the tool organize best practices around six foundational pillars: 

Operational Excellence:

  •        Focuses on running and managing systems that deliver business value, including aspects like automation and runbooks.

Security:

  •        Centers on protecting data, systems, and assets, covering areas like identity and access management, detection, and incident response.

Reliability:

  •        Ensures a workload performs its intended function correctly and consistently when expected, focusing on foundations, change management, and failure handling.

Performance Efficiency:

  •        Addresses the efficient use of computing resources to meet system requirements and maintain that efficiency as demand changes.

Cost Optimization:

  •        Provides guidance on avoiding unneeded costs and managing expenditure effectively in the cloud.

Sustainability:

  •        Focuses on reducing energy consumption and increasing efficiency in the cloud to minimize environmental impacts. 

NB:

  • To get started using AWS Well-Architected Tool (WAT), visit the AWS Well-Architected Tool Documentation on the link:

https://docs.aws.amazon.com/wellarchitected/latest/userguide/tutorial.html

1. The concept: AWS Well-Architected

The AWS Well-Architected Tool is not just a questionnaire. It is:

  •         A structured risk discovery system
  •         A design review framework aligned to AWS best practices
  •         A continuous improvement mechanism, not a one-time audit
  •         A governance artifact that can support compliance, leadership reporting, and funding decisions

NB:

  • It helps twtech to identify architectural risks and map them to AWS-recommended improvements across the 6 Well-Architected Pillars.

2. Core Concepts twtech Must Understand

2.1 Workloads

A workload is:

A collection of AWS resources that deliver business value

Examples:

  •         A production e-commerce platform
  •         A CI/CD pipeline + EKS cluster
  •         A serverless data ingestion platform

Each workload:

  •         Has an owner
  •         Has an environment (Prod / Non-Prod)
  •         Is reviewed pillar by pillar

2.2 Pillars & Lenses

Pillars (Built-in)

      1.     Operational Excellence
2.     Security
3.     Reliability
4.     Performance Efficiency
5.     Cost Optimization
6.     Sustainability

Lenses

Lenses extend the core pillars for specific domains:

  •         Serverless Lens
  •         SaaS Lens
  •         Financial Services Lens
  •         Data Analytics Lens
  •         Machine Learning Lens
  •         Custom Lenses (very powerful for enterprises)

NB:

 Advanced practice: Many enterprises create custom lenses for:

  •         DevSecOps standards
  •         Regulatory requirements
  •         Internal platform rules

3. Review Process – Step by Step (Real-World)

Step 1: Define the Workload

twtech provides:

  •         Business value
  •         Architecture type
  •         Industry
  •         Environment
  •         Review owner

NB:

  • This context influences risk weighting.

Step 2: Answer Pillar Questions

Each pillar has:

  •         Design principles
  •         Best-practice questions
  •         Multiple-choice answers

Example (Security Pillar):

  • How does twtech manage identities for people and machines?

Answers are mapped internally to:

  •         Best practices
  •         Risk levels

Step 3: Risk Identification (High / Medium / None)

After answering:

  •         AWS flags High-Risk Issues (HRIs)
  •         Each HRI is tied to:
    •    A specific question
    •    A specific architectural gap

 NB:

  • HRIs are the currency of the Well-Architected Tool.

Step 4: Improvement Plan Generation

For every HRI, the tool provides:

  •         Recommended actions
  •         AWS documentation links
  •         Service-specific guidance

twtech can:

  •         Mark items as Not Applicable
  •         Track Improvement Status
  •         Assign Owners

4. Pillar-Specific Deep Insights (Beyond Basics)

4.1 Operational Excellence

Focus areas:

  •         Observability
  •         Change management
  •         Incident response
  •         Automation

Advanced signals:

  •         No runbooks HRI
  •         Manual deployments HRI
  •        No game days HRI

 DevOps tie-in:
This pillar heavily rewards:

  •         CI/CD
  •         Infrastructure as Code
  •         Automated rollbacks

4.2 Security

Focus areas:

  •         Identity & access
  •         Detection
  •         Infrastructure protection
  •         Data protection
  •         Incident response

High-impact HRIs:

  •         No MFA on root
  •         Over-permissive IAM
  •         No centralized logging
  •         No automated security monitoring

 DevSecOps insight:
Security HRIs often correlate directly with:

  •         IAM anti-patterns
  •         Lack of SCPs
  •         Weak secrets management

4.3 Reliability

Focus areas:

  •         Fault isolation
  •         Recovery planning
  •         Change management
  •         Capacity planning

Red flags:

  •         Single AZ deployments
  •         No DR strategy
  •         No backup testing
  •         Tight coupling between services

 SRE mapping:
This pillar aligns closely with:

  •         Error budgets
  •         MTTR reduction
  •         Chaos engineering

4.4 Performance Efficiency

Focus areas:

  •        Compute selection
  •         Scaling strategies
  •         Monitoring & tuning

Advanced signals:

  •         Static instance sizing
  •         No auto scaling
  •         Poor data access patterns

 Cloud engineering angle:
This is where:

  •         Graviton
  •         Spot
  •         Serverless shine when architected properly.

4.5 Cost Optimization

Focus areas:

  •         Visibility
  •         Resource right-sizing
  •         Demand management

Common HRIs:

  •         No cost allocation tags
  •         No budgets/alerts
  •         Idle resources
  •         Over-provisioned storage

 Leadership impact:
This pillar often justifies:

  •         FinOps initiatives
  •         Platform teams
  •         Executive dashboards

4.6 Sustainability

Focus areas:

  •         Energy-efficient services
  •         Right-sizing
  •         Managed services

Signals:

  •         Long-running idle compute
  •         Inefficient storage tiers
  •         No lifecycle policies

 Forward-looking insight:
This pillar is increasingly tied to:

  •         ESG reporting
  •         Regulatory pressure
  •         Enterprise procurement decisions

5. Reports & Artifacts twtech Can Generate

The tool generates:

  •         Executive summary
  •         HRI list
  •         Improvement plan
  •         Pillar heat maps

These are commonly used for:

  •         Architecture reviews
  •         Funding requests
  •         Compliance evidence
  •         Cloud Center of Excellence (CCoE) reporting

6. Automation & Integration (Advanced Use)

6.1 APIs & Export

twtech can:

  •         Export reviews
  •         Track changes over time
  •         Feed data into dashboards

6.2 CI/CD Integration (Advanced Pattern)

Some teams:

  •         Run Well-Architected reviews at release milestones
  •         Fail promotion if new HRIs appear
  •         Track architectural debt like technical debt

7. Anti-Patterns to Avoid

❌      Treating it as a checkbox exercise
❌      Ignoring HRIs because “it works today”
❌      Doing reviews only once
     Not assigning owners to improvements

8. Best-Practice Cadence (What Teams Should Do)

Environment

Review Frequency

Production

Every 6–12 months

Major redesign

Before launch

Regulated workloads

Quarterly

Platform services

After major changes

9. How This Helps in twtech Career Path

Given DevOps / DevSecOps / Cloud Engineering focus:

  •         Strengthens architecture credibility
  •         Aligns with AWS Partner & enterprise standards
  •         Positions you for:
    •    Principal Engineer
    •    Cloud Architect
    •    Platform Lead
    •    SRE / DevSecOps leadership

Insights:

  • A full, end-to-end AWS Well-Architected review walkthrough for a modern SaaS platform using EKS + Serverless.

Focus:

  •        Executed in a real enterprise or AWS Partner engagement.
  •        what a Principal Cloud / DevSecOps Engineer would actually do.
  •        Example Workload: SaaS Platform (EKS + Serverless)

1. Workload Definition (Foundation)

  • Workload Name: acme-saas-platform-prod
  • Environment: Production
  • Industry: Software / SaaS
  • Regions: us-east-2 (primary), us-west-1 (DR)
  • Business Value:
    •         Multi-tenant B2B SaaS
    •         99.9% customer SLA (Service Level Agreement) 
    •         Regulated customer data (SOC 2)

Architecture Overview

  •         Amazon EKS (multi-AZ)
  •         ALB Ingress Controller
  •         Lambda (async processing, webhooks)
  •         API Gateway
  •         Aurora PostgreSQL (Multi-AZ)
  •         S3 (object storage + backups)
  •         CloudFront
  •         Cognito (customer auth)
  •         GitHub Actions ArgoCD
  •         Terraform + Helm
  •         Datadog + CloudWatch
  •         AWS WAF + Shield

2. Operational Excellence Pillar – Deep Review

Key Questions & Answers

Q: How do you manage and automate changes?

  •         ✅     GitOps (ArgoCD)
  •         ✅     IaC (Terraform)
  •             No automated rollback validation

Q: How do you respond to incidents?

  •         ❌    No formal runbooks
  •         ❌    No game days
  •             PagerDuty + Datadog alerts

High-Risk Issues (HRIs)

Risk

Description

HRI-OE-01

Incident response not codified

HRI-OE-02

No failure injection testing

Recommended Improvements

  •         Create runbooks (per service)
  •         Implement automated rollback checks
  •         Run quarterly game days

 DevOps insight:

  • Most SaaS teams fail this pillar not due to tooling, but due to process gaps.

3. Security Pillar – DevSecOps Lens

Key Questions & Answers

Q: How are identities managed?

  •              IAM roles for service accounts (IRSA)
  •         ❌     No SCPs
  •             Broad admin roles

Q: How do you detect security events?

  •              GuardDuty
  •         ❌     No Security Hub aggregation
  •              Logs not centralized cross-account

Q: How is data protected?

  •              KMS encryption (S3, RDS)
  •              Secrets partially stored in env vars

HRIs

Risk

Description

HRI-SEC-01

Over-permissive IAM

HRI-SEC-02

Weak secrets management

HRI-SEC-03

Incomplete security monitoring

Improvements

  •         Enforce least privilege IAM
  •         Migrate to AWS Secrets Manager
  •         Enable Security Hub + Org aggregation

 DevSecOps note:
This pillar directly exposes IAM debt, which is often invisible until reviewed.

4. Reliability Pillar – SRE-Focused

Key Questions & Answers

Q: How do you manage failures?

  •         ✅     Multi-AZ EKS
  •         ❌     Single-region active
  •              No DR testing

Q: How do you back up data?

  •         ✅     Automated RDS backups
  •         ❌     Restore not tested
  •              No RPO/RTO defined

HRIs

Risk

Description

HRI-REL-01

No DR strategy

HRI-REL-02

Backup restore unvalidated

Improvements

  •         Define RTO/RPO
  •         Implement pilot-light DR
  •         Test restores quarterly

 SRE insight:
Reliability HRIs almost always correlate with business SLA risk.

5. Performance Efficiency Pillar

Key Questions & Answers

Q: How is compute selected?

  •         ❌     Fixed node groups
  •         ❌     No Graviton
  •              Lambda memory not tuned

Q: How do you scale?

  •         ✅    HPA enabled
  •         ❌    No KEDA for async
  •             No load testing

HRIs

Risk

Description

HRI-PERF-01

Sub-optimal compute

HRI-PERF-02

Scaling not validated

Improvements

  •         Introduce Graviton node groups
  •         Use KEDA for event-driven scaling
  •         Run load & stress tests

 Platform insight:
Performance inefficiencies often inflate cost and reduce reliability.

6. Cost Optimization Pillar – FinOps View

Key Questions & Answers

Q: How do you track cost?

  •         ❌      Incomplete cost allocation tags
  •         ❌     No budgets or alerts
  •              No rightsizing reviews

Q: How does twtech optimize resources?

  •         ❌      Idle dev clusters
  •         ❌     Over-provisioned RDS

HRIs

Risk

Description

HRI-COST-01

No cost visibility

HRI-COST-02

No governance

Improvements

  •         Enforce mandatory tagging
  •         Create AWS Budgets
  •         Enable Compute Optimizer

 Leadership takeaway:
Cost HRIs are often the fastest ROI fixes.

7. Sustainability Pillar

Key Questions & Answers

Q: How do you reduce energy usage?

  •         ❌     No lifecycle policies
  •         ❌     Always-on non-prod clusters
  •              Inefficient storage tiers

HRIs

Risk

Description

HRI-SUS-01

Inefficient resource usage

Improvements

  •         Auto-shutdown non-prod
  •         Add S3 lifecycle rules
  •         Prefer managed services

Enterprise trend:
Sustainability findings increasingly show up in executive reviews.

8. Final Review Output (What Leadership Sees)

HRI Summary

Pillar

HRIs

Operational Excellence

2

Security

3

Reliability

2

Performance

2

Cost

2

Sustainability

1

Total

12 HRIs

Improvement Plan

  •         30-day: IAM, logging, tagging
  •         60-day: DR design, secrets migration
  •         90-day: Game days, Graviton, FinOps cadence

9. How Mature Teams Use This Review

Feed HRIs into Jira
Assign owners per pillar
Track reduction over time
Re-review before major releases

10. Why This Matters for You

For a DevOps / DevSecOps / Cloud Engineer, running reviews like this:

  •         Demonstrates architecture ownership
  •         Bridges engineering leadership
  •         Aligns with AWS Partner standards
  •         Builds Principal / Architect credibility


Project: Hands-On

How twtech uses AWS Well-Architected Tool (WAT) in its Environment to:

  • Review workloads against current AWS architectural best practices and provides guidance for improvement / remediation.
  • Search for AWS service: Well-Architected Tool (WAT

How Well-Architected Tool (WAT) works

Benefits and features

Step-1: Define workload:

Specify properties


Step-2:Apply lenses

Define workload:


Step-3: Start Reviewing workload from the lens of AWS well Architected Framework.

Step-4: Determinet twtech priorities

  •        Select from the options available on the console.
Ops 1: How does twtech determime what its priorities are?

Ops 2:  How does twtech structure its organization to support its business


Ops 3: How does twtech organization culture support its business outcomes?


Ops 4: How does twtech implement observability in its workload?

Ops 5: How does twtech reduce defects, ease remediation, and improve flow into production?


Ops 6: How does twtech mitigate deployment risks?


Ops 7: How does twtech know that its is ready to support a workload?


Ops 8: How does twtech utilize workload observability in its organization? 


Ops 9: How does twtech understand the health of its Operation?


Ops 10: How does twtech manage workload and operations events?

Ops 11: How does twtech evolve operations?


SEC 1.How does twtech securely operate its workload

SEC 2.How does twtech manage identities for people and machines?

SEC 3.How does twtech manage permissions for people and machines

SEC 4.How does twtech detect and investigate security events?

SEC 5. How does twtech protect its network resources?

SEC 6. How does twtech protect its compute resources?

SEC 7. How does twtech classify its data?

SEC 8. How does twtech protect its data at rest?

SEC 9. How does twtech protect its data in transit?

SEC 10. How does twtech anticipate, respond to, and recover from incidents?

SEC 11. How des twtech incorporate and validate the security properties of applications throughout the design, development, and deployment lifecycle?

REL 1. How does twtech manage service quotas and constraints

REL 2. How does twtech plan its network topology?


NB:

  • twtech at this point can save and exit to review the results. 
  • However, for best practices, twtech recommends that all the sectors of the 6 pillar framework be also configured, especially for the prod environment.

Review results

How twtech gets to the recommendations from aws well-Architected Tool.

Navigate down to recommendations and possible remediation from aws.

OPS01-BP02 Evaluate internal customer needs

https://docs.aws.amazon.com/wellarchitected/2025-02-25/framework/ops_priorities_int_cust_needs.html


Implementation guidance (remediation)

OPS01-BP05 Evaluate threat landscape

https://docs.aws.amazon.com/wellarchitected/2025-02-25/framework/ops_priorities_eval_threat_landscape.html

Implementation guidance

Finally,  twtech  can continues reviewing its best practices as follow



No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...