Monday, December 15, 2025

AWS Systems Manager (SSM) Run Command - Overview. | Overview.

AWS Systems Manager (SSM) Run Command - Overview.

 Focus:

    •  Tailored for:
      • DevOps / 
      • DevSecOps / 
      • Cloud Engineers (aligned with large-scale production environments)

Scope:

  • Intro,
  • Key Features & Benefits,
  • How AWS Systems Manager (SSM) Run Command Works,
  • Common Use Cases,
  • Link to official documentation,
  • The concept of SSM Run Command (Deep dive),
  • Core Architecture Components,
  • Run Command Execution Flow,
  • SM Documents (Command Documents),
  • Targeting Strategies (Critical at Scale),
  • Security & IAM Model (Enterprise Focus),
  • Logging, Auditing & Compliance,
  • Networking Model (No Inbound Access),
  • Operational Use Cases,
  • Run Command vs Alternatives,
  • Scaling Limits & Performance,
  • Best Practices (Production-Grade),
  • Reference Enterprise Pattern.

Intro:

  • AWS Systems Manager (SSM) Run Command is a powerful capability for executing:
    •  Administrative commands 
    • And scripts on managed nodes (Amazon EC2 instances or on-premises servers) remotely and at scale, without requiring an interactive login like:
      • SSH 
      • or RDP. 

Key Features & Benefits

Centralized Management: 

    •  twtech can manage a large fleet of instances from a single interface, from:
      • Either the AWS Management Console, 
      • AWS Command Line Interface (CLI), 
      • or SDKs.

Security: 

    •  Run Command eliminates the need to:
      • Open inbound ports, 
      • Manage SSH keys, 
      • or share RDP passwords.
    •  Run Command reduces security risks. 
    • Access is controlled through:
      •  AWS Identity and Access Management (IAM) policies.

Automation Scale: 

    •  Run Command is ideal for:
      • Automating common tasks such as installing software, 
      • Patching operating systems, 
      • Running custom scripts, 
      • Or restarting services across multiple instances simultaneously.

Cross-Platform Support:

    •   Run Command supports both:
      • Linux 
      • And Windows operating systems, 
        • Including on-premises machines 
        • Instances from other cloud providers, 
        • Thus, offering a unified management solution.

No Additional Cost:

    •  There is no charge for using Run Command.
    •  twtech only pay for the underlying resources consumed during the tasks.

How AWS Systems Manager (SSM) Run Command Works

Prerequisites: 

 A. The managed nodes: 
  • Must have the SSM Agent installed and running, 
  • Must be assigned an IAM role with the necessary permissions. 
    •     Sample Policy:
      •  AmazonSSMManagedInstanceCore 

Command Execution: 

B. A user sends a command using a pre-defined SSM document (which defines actions to be performed) to specified targets. 

  •      The Specified targets must be identified by: 
    •       Instance IDs 
    •       or tags.  

Agent Action: 

C. The SSM Agent on the target instance: 

  •      Receives the command via the secure Amazon Message Bus, 
  •      Then executes the instructions, 
  •      And sends the output + status back to the Systems Manager service.

Monitoring & History: 

D. The command status: 

    •      Pending, 
    •      In Progress, 
    •      Success, 
    •       Or Failed. 
  •     These Status can be monitored in the console indicating a detailed history:  

      •     Of command executed.
        •     Their outputs is stored & reviewable.

Common Use Cases

Software Updates: 

    • Automatically update the SSM Agent or other software packages across instances.

Troubleshooting:

    • Execute diagnostic scripts and review logs without direct server access.

Configuration Management:

    •  Applied configuration can be changed using built-in or custom SSM documents.

Integration:

    •  Run Command can be integrated with other AWS services like:
      • Maintenance Windows for scheduled tasks, 
      • Or CloudWatch for monitoring, 
      • S3 etc. 

Link to official documentation

https://docs.aws.amazon.com/systems-manager/latest/userguide/monitor-commands.html

1. The concept of SSM Run Command (Deep dive)

  • SSM Run Command allows twtech to remotely execute commands on managed nodes like:
    • EC2, 
    • on-prem servers, 
    • hybrid, 
    • containers via ECS Exec integration without SSH/RDP.
  • SSM Run Command does that with:
    • AWS-native IAM, 
    • logging, 
    • And auditing.

SSM Run Command is:

    • Agent-based (SSM Agent)
    • API-driven
    • IAM-controlled
    • Auditable
    • Scalable ( For Tens of thousands of nodes)

2. Core Architecture Components

Control Plane (AWS-Managed)

    • SSM Service
    • Run Command API
    • SSM Documents (Command Documents)
    • State Manager / Automation (optional orchestration)

Data Plane

    • Managed Nodes:
      • EC2 (Amazon Linux, Ubuntu, RHEL, Windows)
      • On-prem / other cloud (via SSM Hybrid Activations)
    • SSM Agent
    • IAM Instance Profile / Hybrid Credentials

Supporting Services

    • IAM – authorization
    • Amazon S3 – stdout/stderr storage
    • CloudWatch Logs – real-time logs
    • CloudTrail – API audit trail
    • KMS – encryption at rest
    • VPC Endpoints (PrivateLink) – private access

3. Run Command Execution Flow

1.  User / Pipeline / Automation calls:

2. ssm:SendCommand

3.  AWS SSM validates:

      •    IAM permissions
      •    Target instance tags / IDs
      •    Document permissions

4. SSM Agent polls SSM service over HTTPS (443)

5. Command is delivered to agent

6.  Agent executes locally (root / SYSTEM)

7. Output streams to:

      •    S3
      •    CloudWatch Logs

8. Status updates returned:

   Pending InProgress Success | Failed | TimedOut

4. SSM Documents (Command Documents)

Types

    •  AWS-Provided
      •    AWS-RunShellScript
      •    AWS-RunPowerShellScript
      •    AWS-UpdateSSMAgent
      •    AWS-ApplyPatchBaseline
    • Custom Documents
      •    YAML or JSON
      •    Versioned
      •    Parameterized

Sample (YAML) file 

# yaml
schemaVersion: "2.2"
description: "Restart nginx safely"
parameters:
  service:
    type: String
    default: nginx
mainSteps:
  - action: aws:runShellScript
    name: restartService
    inputs:
      runCommand:
        - systemctl restart {{ service }}

5. Targeting Strategies (Critical at Scale)

Instance Targeting Options

Method

      Use Case

Instance IDs

One-off / break-glass

Tags

Production-safe, scalable

Resource Groups

App-centric ops

Account-wide

Maintenance windows

Best Practice

  • Environment=prod
  • Application=payments
  • Role=web

NB:

  • Never target by instance ID in production automation.

6. Security & IAM Model (Enterprise Focus)

IAM Permissions

Key actions:

    •  ssm:SendCommand
    •  ssm:GetCommandInvocation
    •  ssm:ListCommands
    •  ssm:CancelCommand

# Least-Privilege Sample Policy

# json
{
  "Effect": "Allow",
  "Action": "ssm:SendCommand",
  "Resource": [
    "arn:aws:ssm:us-east-2::document/AWS-RunShellScript",
    "arn:aws:ec2:us-east-2:accountID:instance/*"
  ],
  "Condition": {
    "StringEquals": {
      "ssm:resourceTag/Environment": "prod"
    }
  }
}

# Node Permissions

    •  AmazonSSMManagedInstanceCore
    •  Optional: S3 / CloudWatch write permissions

7. Logging, Auditing & Compliance

CloudTrail (Audit)

Tracks:

    • Who ran the command
    • What document
    • Which targets
    • Parameters used

CloudWatch Logs

    • Real-time command output
    • Centralized log analysis
    • SIEM integration

S3 Output

    • Long-term retention
    • Immutable logs
    • Forensics / audits

Compliance Standards Supported

    • SOC 2
    • ISO 27001
    • PCI-DSS
    • HIPAA (with controls)

8. Networking Model (No Inbound Access)

Required Outbound

  •  HTTPS (443) to:
    •    ssm
    •    ec2messages
    •    ssmmessages

Private VPC Architecture

  • Interface VPC Endpoints:
    •    com.amazonaws.region.ssm
    •    ec2messages 
    •    ssmmessages
  • No NAT 
  • or Internet Gateway required

NB:

This is zero-trust friendly (Trust nobody- "never trust, always verify," ).

9. Operational Use Cases

Operations

    •  Patch deployment
    •  Config updates
    •  Service restarts
    •  Certificate rotation

Incident Response

    •  Memory capture
    •  Log collection
    •  Process inspection
    •  Malware containment

Automation

  • Integrated with:
    •    EventBridge
    •    Step Functions
    •    CodePipeline
    •    Lambda

10. Run Command vs Alternatives

Tool

      When to Use

Run Command

Fleet-wide, non-interactive

Session Manager

Interactive shell

Automation

Multi-step workflows

Ansible / SSH

Legacy / non-AWS

Kubernetes exec

Container-level

11. Scaling Limits & Performance

    •  Concurrency: 50 per account (soft limit, increaseable)
    •  Targets per command: 50k+
    •  Execution timeout: 3600s
    •  Document size: 64 KB

NB:

  • Use rate control and error thresholds for safety.

12. Best Practices (Production-Grade)

    Use tags for targeting
    Enable CloudTrail + CW Logs
    Enable Encrypt outputs with KMS
    Use custom documents (not inline commands)
    Pair with Maintenance Windows
    Test in lower environments
    Avoid ad-hoc prod commands
    Avoid instance-ID targeting

13. Reference Enterprise Pattern






No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...