Tuesday, January 27, 2026

Amazon EventBridge | Overview.


Amazon EventBridge - Overview.

Scope:

  • Intro,
  • Core Concepts,
  • Key Benefits,
  • Link to official documentation,
  • Insights.

Intro:

    • Amazon EventBridge is a serverless event bus service that enables twtech to build event-driven applications at scale using events from its applications, third-party software as a service (SaaS) applications, and other AWS services
    • Amazon EventBridge provides a simple, consistent way to ingest, filter, transform, and deliver events to various targets for processing. 
Core Concepts
    • Events: An event signifies a change in an environment or system, such as an object being added to an Amazon S3 bucket or a change in an EC2 instance's state.
    • Event Buses: Event buses act as routers that receive events and deliver them to specified targets.
    • Rules: Rules define what EventBridge does with the events delivered to an event bus. There are two types:
      • Event Patterns: Rules that match specific data patterns within an event's structure.
      • Schedules: Rules that run on a predefined schedule (e.g., using cron expressions) to invoke targets at specific times.
    • Targets: When an event matches a rule, EventBridge sends the event's JSON message to one or more designated targets, such as AWS Lambda functions, Amazon SNS topics, Amazon SQS queues, or API destinations. 
Key Benefits
    • Decoupling: EventBridge allows for the decoupling of application components, making the system more resilient and easier to maintain.
    • Integration: It simplifies integration with a wide array of AWS services and SaaS partners without requiring custom code.
    • Scalability and Reliability: The service is designed for low-latency, high-throughput event processing and offers high reliability for event delivery.
    • Content-Based Filtering: It supports precise filtering using comparison operators and ranges of values within the event data, reducing the need for downstream custom filtering logic. 
Link to official documentation:
https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html
twtech-Insights:

1. What EventBridge Really Is

Amazon EventBridge is a serverless event bus that enables event-driven architectures by routing events from producers to consumers using rules.

Think of it as:

A smart event router with schema awareness and SaaS integrations

It evolved from CloudWatch Events, but now supports:

    • Multiple event buses
    • Cross-account routing
    • Schema registry
    • SaaS event sources (e.g., Salesforce, Zendesk)
    • Fine-grained filtering & transformations

2. Core Architecture Components

 Event Sources (Where events originate from).

Types:

  • AWS Services
    EC2, S3, Lambda, ECS, Step Functions, CodePipeline, etc.

  • Custom Applications
    Via PutEvents API

  • SaaS Partners
    (Stripe, Auth0, Datadog, PagerDuty, etc.)

NB:

Each event is a JSON document.

 Event Bus

A logical container for events.

Three types:

  1. Default Event Bus

    • Automatically receives AWS service events

  2. Custom Event Bus

    • For application-specific or domain-driven architectures

  3. Partner Event Bus

    • Dedicated to SaaS integrations

 Best Practice:

    • Use one event bus per domain (e.g., orders-bus, billing-bus)

 Events (Structure)

  • An EventBridge event has a predictable shape:

{ "source": "aws.ec2", "detail-type": "EC2 Instance State-change Notification", "time": "2026-01-27T10:15:30Z", "region": "us-east-2", "resources": [], "detail": { "instance-id": "i-1234567890", "state": "running" } }

Key fields:

    • source Who emitted the event
    • detail-type What kind of event it is
    • detail The payload you actually care about

 Rules

    • Rules decide which events go where.

Each rule has:

    • Event pattern (filter)
    • Target(s)

Event Pattern Sample

{ "source": ["aws.ec2"], "detail": { "state": ["stopped"] } }

NB:

✔ Only matches EC2 stop events

✘ No code needed

 Targets

    • Where matched events are delivered.

Common targets:

    • AWS Lambda
    • Step Functions
    • SNS / SQS
    • Kinesis Data Streams
    • ECS tasks
    • API Destinations (HTTP endpoints)

NB:

    • 🎯 One rule multiple targets allowed

3. Event Flow (End-to-End)

Event Source Event Bus Event Rule (Pattern Matching) Target(s)

Key characteristics:

    • Push-based (no polling)
    • Fully managed
    • Near real-time (typically milliseconds)

4. Advanced Features (Where EventBridge Shines)

Schema Registry

    • Automatically discovers event schemas
    • Generates code bindings (Java, Python, TS)
    • Helps teams avoid breaking changes

NB:

    • 💡 Great for large orgs with multiple producers/consumers

 Event Transformations (Modify events without Lambda Sample):

{ "instanceId": "$.detail.instance-id", "state": "$.detail.state" }

This approach Reduces:

    • Lambda glue code
    • Cost
    • Latency

 Cross-Account Event Routing

  • EventBridge supports resource-based policies.

Use cases:

    • Centralized monitoring account
    • Security event aggregation
    • Multi-account microservices

 Archive & Replay

    • Store events for debugging or backfills
    • Replay historical events to rules
This is for:

    • Disaster recovery
    • Reprocessing failed logic
    • Auditing

5. Reliability, Limits & Guarantees

Delivery Guarantees

    • At-least-once delivery
    • Possible duplicates consumers must be idempotent

Retry & DLQ

    • Automatic retries
    • Dead-letter queues (SQS or SNS)

Quotas (High level)

    • 10,000 rules per bus (soft limit)
    • Event size 256 KB
    • ~100K events/sec per bus (region-dependent)

6. EventBridge vs Alternatives

ServiceBest For
EventBridge           Event routing, SaaS integration, decoupling
SNS           Fan-out notifications
SQS           Durable message queues
Kinesis           High-throughput streaming
Kafka             Complex streaming & ordering


NB:

    •  EventBridge is not a stream processor
    •  It’s an event router & integration layer

7. Common Design Patterns

 Event-Driven Microservices

    • Producers emit domain events
    • Consumers subscribe independently
    • Zero coupling

 Automation & Ops

    • React to AWS service events
    • Trigger remediation workflows

 SaaS Integration

    • Receive third-party events
    • Route internally without custom polling

 Choreography (vs Orchestration)

    • EventBridge for loose coupling
    • Step Functions when control flow matters

8. Security & IAM Model

    • IAM controls PutEvents
    • Resource policies control cross-account access
    • Targets assume execution roles

🔐 Always:

    • Restrict PutEvents
    • Validate event source
    • Use least privilege

9. Cost Model (Simple & Predictable)

    • Charged per event published
    • Free tier included
    • No charge for rules or targets

NB:

    • 💡 Cheaper than Lambda glue for routing logic

10. When NOT to Use EventBridge (Avoid it if twtech needs):

    • Strict ordering
    • Exactly-once delivery
    • Massive streaming analytics
    • Stateful processing




Thursday, December 25, 2025

AWS Trusted Advisor (TA) | Deep Dive & Hands-On.

An Overview of AWS Trusted Advisor (TA).

Focus:

  •        Framed for DevOps / Cloud / SRE
  •        Aligned with Well-Architected thinking.

Breakdown:

  •        Intro,
  •        Key Features and Functionality,
  •        Integration with Other Services,
  •        Accessing Trusted Advisor,
  •        The concept: AWS Trusted Advisor,
  •        Trusted Advisor vs Well-Architected Tool (Quick Context)
  •        The 5 Trusted Advisor Check Categories (Deep Dive)
  •        Cost Optimization,
  •        Security,
  •        Fault Tolerance (Reliability),
  •        Performance,
  •        Fault Tolerance (Reliability),
  •        Service Limits (Quotas),
  •        Support Plan Impact (Very Important),
  •        Automation & Integrations (Where TA Shines),
  •        Sample DevOps Automation Flow,
  •        Trusted Advisor vs Third-Party Tools,
  •        When Should twtech Rely on Trusted Advisor,
  •        When Should twtech NOT Rely on Trusted Advisor,
  •        twtech Recommendation  (DevOps/SRE Playbook),
  •        Insights.

Intro:

  •        AWS Trusted Advisor is a web service that inspects twtech AWS environment and provides real-time recommendations based on best practices across six categories: 
    •    cost optimization,
    •    performance,
    •     resilience,
    •    security,
    •    operational excellence,
    •    and service limits
  •        Trusted Advisor draws upon best practices learned from serving hundreds of thousands of AWS customers to identify opportunities to save money, improve availability and performance, and help close security gaps.

Key Features and Functionality

Best Practice Checks:

  •          Trusted Advisor continuously evaluates your AWS environment using a set of automated checks and then recommends actions to remediate any deviations from best practices.

Support Plan Integration:

  •          The number of checks available depends on your AWS Support plan.
  •    Basic and Developer Support plans have access to all service limits checks and selected security and fault tolerance checks.
  •    Business, Enterprise On-Ramp, and Enterprise Support plans have access to the full suite of checks.

Organizational View:

  •          For organizations using AWS Organizations, the organizational view provides a consolidated view of Trusted Advisor recommendations across all accounts.

Integration with Other Services:

AWS Support API:

  •     Allows programmatic access to check results.

Amazon CloudWatch and EventBridge:

  •    twtech can create alarms and rules to monitor Trusted Advisor metrics and check status changes.

AWS Well-Architected Tool:

  •     Integrates with the tool to evaluate workloads and provide data-driven insights.

AWS Config:

  •     Many new checks are powered by AWS Config managed rules, enhancing the monitoring of operational excellence.

Prioritized Recommendations:

  •          Available to Enterprise Support customers, Trusted Advisor Priority highlights the most critical recommendations, often including context-driven insights from your AWS account team. 

Accessing Trusted Advisor 

The concept: AWS Trusted Advisor

  • AWS Trusted Advisor is a real-time advisory service that continuously evaluates twtech AWS environment against AWS best practices and surfaces actionable recommendations across five domains:

     1.     Cost Optimization
2.     Security
3.     Fault Tolerance
4.     Performance
5.     Service Limits

Think of AWS Trusted Advisor (TA) as:

A continuously running automated cloud review engine ...not a one-time audit.

Trusted Advisor vs Well-Architected Tool (Quick Context)

Trusted Advisor

     Well-Architected Tool

Continuous

Point-in-time review

Automated checks

Architect-led assessment

Resource-level findings

Design-level questions

Ops-focused

Architecture-focused

In practice:

  •         Trusted Advisor day-to-day hygiene
  •         Well-Architected quarterly / major-change reviews

The 5 Trusted Advisor Check Categories (Deep Dive)

1. Cost Optimization

  • Identifies waste, overprovisioning, and idle resources.

Common Checks

  •         Idle EC2 instances
  •         Underutilized EBS volumes
  •         Idle Load Balancers
  •         Low-utilization RDS
  •         Unassociated Elastic IPs
  •         Reserved Instance & Savings Plan optimization

Sample Finding

  • 12 EBS volumes unattached for over 30 daysestimated monthly savings: $480”

DevOps Best Practice

  •         Integrate TA cost checks into FinOps dashboards
  •         Use tagging enforcement + TA findings to assign ownership
  •         Auto-remediate using Lambda where safe

2. Security

  • Maps closely to CIS benchmarks and AWS security best practices.

Common Checks

  •         S3 buckets with public access
  •         Security groups allowing 0.0.0.0/0 on sensitive ports
  •         IAM users with:
    •    No MFA
    •    Unused access keys
    •    Passwords older than policy
  •         Root account without MFA
  •         Exposed RDS snapshots

Sample Finding

  • Security Group sg-xxxx allows SSH from 0.0.0.0/0

DevSecOps Tie-in

  •         Treat TA findings as security debt
  •         Send findings to:
    •    Security Hub
    •    Jira / ServiceNow
    •    SIEM (via EventBridge)

3. Fault Tolerance (Reliability)

  • Focuses on resilience and availability.

Common Checks

  •         EC2 instances without EBS-backed volumes
  •         Single-AZ RDS databases
  •         ELBs without multiple targets
  •         Auto Scaling groups without health checks
  •         Missing backups

Sample Finding

  • RDS instance is running in a single Availability Zone”

SRE Angle

    •         TA highlights fragile infrastructure
    •         Use it to prioritize:
      •    Multi-AZ
      •    Auto Scaling
      •    Backup policies

4. Performance

  • Ensures services are appropriately sized and configured.

Common Checks

  •         EC2 instances with high CPU or memory pressure
  •         Classic Load Balancer usage (legacy)
  •         CloudFront configuration inefficiencies
  •         Suboptimal EBS volume types

Sample Finding

  • Instance t3.micro experiencing sustained CPU throttling

Platform Engineering Use

    •         Feed TA signals into:
      •    Capacity planning
      •    Instance family modernization
      •    Graviton adoption programs

5. Service Limits (Quotas)

  • Prevents scaling failures caused by quota exhaustion.

Common Checks

  •         EC2 instance limits
  •         VPC limits
  •         EIP limits
  •         Load balancer limits
  •         Lambda concurrency limits

Sample Finding

  • EC2 On-Demand instance usage at 85% of quota

Ops Impact

    •         One of the highest-value checks
    •         Prevents:
      •    Failed deployments
      •    Incident escalations
    •         Should be monitored like alerts

Support Plan Impact (Very Important)

Support Plan

     Checks Available

Basic / Developer

Limited checks only

Business

Full Trusted Advisor

Enterprise

Full + prioritized support

NB:

  •  Full value requires Business or Enterprise support

Automation & Integrations (Where TA Shines)

Event-Driven Ops

  •         TA publishes findings to Amazon EventBridge
  •         Enables:
    •    Auto-ticket creation
    •    Slack notifications
    •    Auto-remediation

Security Hub

  •         TA security checks can flow into AWS Security Hub
  •         Unified security posture view

API & CLI

  •         Query findings programmatically
  •         Build custom dashboards

Sample DevOps Automation Flow

Trusted Advisor vs Third-Party Tools

TA is:

  •         Native
  •         Free with support
  •         Low false positives

But:

  •         Not deeply customizable
  •         Doesn’t replace:
    •    CSPM tools
    •    Advanced cost optimization platforms

NB:

  • Best used as a baseline control plane.

When Should twtech Rely on Trusted Advisor

Daily operational hygiene
Security posture monitoring
Cost waste detection
Pre-incident prevention
Leadership dashboards

❌   When Should twtech NOT Rely on Trusted Advisor

  •         Application logic issues
  •         Custom compliance frameworks
  •         Deep performance profiling

twtech Recommendation  (DevOps/SRE Playbook)

1.     Enable full TA (Business Support)

2.     Export findings via EventBridge

3.     Classify findings:

    •    Auto-fix
    •    Ticket
    •    Ignore (with justification)

4.     Review trends monthly

5.     Map findings to Well-Architected Pillars 

Insight:

Trusted Advisor Review — EKS-Based SaaS (Production)

  •        A realistic, end-to-end Trusted Advisor (TA)
  •        Review for a production EKS-based SaaS, & serverless infrastructure.
  •        Talored for DevOps / SRE / Platform lead.

Scenario

  •         Multi-tenant SaaS
  •         Amazon EKS (managed node groups + Fargate)
  •         ALB Ingress Controller
  •         RDS Aurora (Multi-AZ)
  •         S3 + CloudFront
  •         CI/CD via GitHub Actions
  •         Business Support enabled

Step 1: Open Trusted Advisor (What twtech Actually See)

In the AWS Console:

Support Trusted Advisor  Dashboard

twtech see:

  •         Overall check summary
  •         Counts per category
  •         Red / Yellow / Green indicators

Sample snapshot:

Category

Status

Cost Optimization

🔴 8

Security

🔴 3

Fault Tolerance

🟡 5

Performance

🟢 1

Service Limits

🟡 2

Step 2: Cost Optimization Findings (EKS Reality)

 Finding 1: Underutilized EC2 Instances (Worker Nodes)

TA Output

  • “5 EC2 instances with average CPU utilization below 10% over 14 days”

Why This Happens in EKS

  •         Static node groups
  •         Poor pod bin-packing
  •         No Cluster Autoscaler or misconfigured limits

Action

  •         Enable Cluster Autoscaler
  •         Right-size node groups
  •         Use multiple instance types
  •         Add pod requests/limits

SRE Note

  • This is a platform problem, not an app problem.

 Finding 2: Idle Load Balancer

TA Output

  • “1 Application Load Balancer with no active targets”

Root Cause

  •         Old Ingress left behind
  •         Blue/green deployment cleanup failure

Action

  •         Validate Ingress ownership via tags
  •         Delete unused ALB
  •         Add CI/CD cleanup checks

 Finding 3: Unattached EBS Volumes

TA Output

  • 12 EBS volumes unattached for 30+ days

Common EKS Cause

  •         PVC deleted
  •         Volume left behind due to reclaim policy

Action

  •         Audit Retain vs Delete
  •         Use CSI driver lifecycle policies

Step 3: Security Findings (High Signal)

Finding 4: Security Group Allows 0.0.0.0/0 on Port 443

TA Output

  • “Security group allows unrestricted access”

Reality Check

  •         ALB SG intentionally public
  •         But backend node SG also exposed ❌

Action

  •         ALB SG 0.0.0.0/0
  •         Node SG ALB SG only
  •         Lock down NodePort ranges

 Finding 5: IAM User Without MFA

TA Output

  • “IAM user has console access without MFA”

Root Cause

  •         Legacy CI user
  •         Someone bypassed IAM roles

Action

  •         Kill static users
  •         Enforce:
    •    IAM roles
    •    OIDC (GitHub Actions / IRSA)
  •         SCP: DenyWithoutMFA

DevSecOps Callout

  • This is a release-blocking issue in mature orgs.

 Finding 6: S3 Bucket Allows Public Access

TA Output

  • “S3 bucket allows public access”

False Positive? Maybe.

  •         Static assets behind CloudFront
  •         But Block Public Access disabled ❌

Action

  •         Enable Block Public Access
  •         Use Origin Access Control (OAC)
  •         Restrict bucket policy to CloudFront only

Step 4: Fault Tolerance (Where Incidents Are Born)

 Finding 7: Auto Scaling Group in Single AZ

TA Output

  • “Auto Scaling group spans a single Availability Zone”

Impact

  •         AZ outage = platform outage
Action

  •         Spread node groups across ≥2 AZs
  •         Verify pod anti-affinity rules

 Finding 8: RDS Backup Retention Low

TA Output

  • “RDS backup retention less than 7 days”

Reality

·        Dev/test DB accidentally promoted

Action

    •         Enforce via:
      •    AWS Config
      •    Terraform guardrails

Step 5: Performance Findings

 Finding 9: No Major Issues

  • Typical for EKS because TA doesn’t deeply inspect Kubernetes internals.

NB:

  •         TA won’t see:
    •    Pod CPU throttling
    •    Memory OOMs
    •    API server saturation

Use:

  •         Prometheus
  •         Karpenter metrics
  •         CloudWatch Container Insights

Step 6: Service Limits (Critical but Ignored)

 Finding 10: EC2 Instance Limit at 80%

TA Output

  • “EC2 On-Demand instance usage approaching limit”

Why This Matters

  •         Scaling events will fail
  •         Deployments stall during incidents

Action

  •         Request quota increase
  •         Migrate to: 
    •    Spot 
    • Gravito  
    • Fargate where possible

Step 7: Prioritization (How Pros Do It)

Severity

    Action

Security

Fix immediately

Service Limits

Fix before next deploy

Cost

Schedule within sprint

Fault Tolerance

Roadmap

Performance

Monitor

Step 8: Automation Pipeline (Real World)

Auto-Fix Candidates

  •         Unattached EBS
  •         Idle ELBs
  •         Unused EIPs
  •         IAM access key rotation reminders

Serverless SaaS Differences (Quick Contrast)

Area

EKS

Serverless

Compute cost

Underutilized EC2

Lambda duration

Security

SG + IAM

IAM + resource policies

Fault tolerance

AZ spread

Mostly managed

Service limits

EC2, ENIs

Lambda concurrency

Common serverless TA findings:

  •         Lambda concurrency limits
  •         Public S3 buckets
  •         Idle API Gateway stages
  •         Underutilized provisioned concurrency

Final takeaway (What to Tell Organization Leadership)

  •        Trusted Advisor identified 3 security risks, 2 scaling blockers, and ~$1,200/month in-waste.
  •        All critical security findings (Concerns) were remediated within 24 hours.”

 Links to useful resources:

https://aws.amazon.com/architecture/

https://aws.amazon.com/solutions/

Project: Hands-On

How twtech uses AWS Trusted Advisor in its Environment to:

  • Provides real-time recommendations based on best practices across six categories: 
    •    cost optimization,
    •    performance,
    •     resilience,
    •    security,
    •    operational excellence,
    •    and service limits
  •        Draws upon best practices (learned from serving AWS customers) to identify opportunities that save money, improve availability, performance, and help close security gaps.
  • Login to aws  account and use the link provided herein to reach AWS Service: AWS Trust Advisor (TA) https://console.aws.amazon.com/trustedadvisor/home.

  • Upgrade twtech AWS Support Plan to get all Trusted Advisor checks

NB:

  • Without Upgrade, twtech AWS Support Plan gets only Limited Trusted Advisor checks.

Service limits

  •        twtech Chooses a check name to see recommendations for services that use more than 80 percent of a service quota.
  •        The check results use values based on a snapshot, so twtech current usage might vary.
  •        Quota and usage data can take up to 24 hours to reflect any chang.

NB:

  • twtech need to pay for a support plan when it Upgrades its AWS Support Plan to get all (full) Trusted Advisor checks.

Addendum:

Links to More Architecture Examples

Link to ASW Certification Solution Architect- Associcate (Exam)




Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...