AWS Instance Scheduler + AWS Organizations + Service Control Policies (SCPs) - Overview.
Focus:
- Tailored for:
- DevOps
- Platform
- Security
- FinOps.
- Aligned with:
- Governance-first design,
- Blast-radius reduction,
- And real-world enterprise patterns.
Scope:
- Intro,
- How
They Interact,
- Key
Considerations & Best Practices,
- Why
This Combination Matters,
- High-Level Architecture (Enterprise Pattern),
- Centralized Instance Scheduler (Best Practice),
- IAM Role Design (Critical),
- SCP Strategy (The Safety Net),
- Tag-Based SCPs (Advanced & Powerful),
- Protect Critical Instances (Break-Glass),
- OU-Based Scheduling Strategy (Recommended),
- Guarding
Against Tag Abuse,
- Auditing
&
Visibility,
- Failure Scenarios (Real World),
- Terraform
/ IaC Governance Pattern,
- Reference Policy Stack (Layered Defense),
- When This Pattern Shines (when to use),
- When
NOT to Use This Pattern,
- Final Take-Home.
Intro:
- AWS Instance Scheduler can manage instances across multiple accounts in an AWS Organization.
- But any Service Control Policies (SCPs) applied to those accounts or
Organizational Units (OUs) must
allow the necessary API actions for the scheduler to function correctly.
- SCPs act as guardrails, setting the maximum available permissions, and can override IAM policies within an account.
- So coordination is crucial.
How
They Interact
AWS Instance Scheduler (Solution):
- Instance Scheduler is AWS-provided solution (AWS Instance Scheduler) that uses:
- AWS Lambda,
- DynamoDB,
- And resource tagging to automatically stop and start EC2 and RDS instances.
- The scheduler (Solution) is typically deployed in a central management account (or a designated admin account) and uses cross-account IAM roles to perform actions in member accounts.
AWS Organizations (Management):
- This service allows central management of multiple AWS
accounts and the grouping of accounts into OUs.
Service Control Policies (SCPs) Guardrails:
- These policies are managed within AWS Organizations and define the maximum permissions available to IAM users and roles in member accounts.
- SCPs do not grant permissions; they filter them.
Key
Considerations & Best Practices
- When combining these services, several factors are important:
Required Permissions:
- The IAM roles created by the Instance Scheduler CloudFormation template in the member accounts must have permissions to perform the:
- ec2:StartInstances,
- ec2:StopInstances,
- rds:StartDBInstances,
- rds:StopDBInstances actions (among others like tag reading/writing and
CloudWatch logging).
- twtech SCPs
must explicitly or implicitly allow these actions.
SCP Impact:
- If an SCP attached to an OU or account denies ec2:StopInstances for example, the
Instance Scheduler will fail to stop instances in that account, even if the IAM
role in the member account has the correct permissions.
- Deny statements in SCPs always take precedence over
allow statements in IAM policies.
Centralized Management:
- To use the AWS Organization ID for cross-account scheduling,
- twtech sets the using AWS Organizations.?
- CloudFormation parameter set to Yes during deployment in the central account and deploy a
smaller "remote" template in each member account.
- This centralizes management of schedules while
delegating execution permissions to each member account via specific roles.
Testing:
- AWS strongly recommends thoroughly testing SCPs in a
staging OU with non-production accounts before applying them broadly across the
organization or to the root.
- This prevents unintended disruptions to running
services.
Managed Policies:
- twtech Ensures that it does not remove the default FullAWSAccess policy from
OUs or accounts without replacing it with a custom policy that still allows
necessary actions, otherwise all AWS actions may fail
In a nut-shell:
- The AWS Instance Scheduler can be effectively used in a multi-account organization, but administrators must ensure that SCPs do not inadvertently restrict the specific start and stop API calls the scheduler needs to function.
1. Why This Combination Matters (Benefits)
- Instance Scheduler saves money.
- Organizations + SCPs prevent outages.
Together,
they give twtech:
✅ Org-wide cost optimization
✅ Centralized control
✅ Guardrails around production
✅ Least-privilege automation
✅ Auditability at scale
- Scheduler without SCPs = outage waiting to happen
- SCPs without Scheduler = wasted money
2. High-Level Architecture (Enterprise Pattern)
Key idea:
- Scheduler can see many accounts, but SCPs decide what it can do.
3. Centralized Instance Scheduler (Best Practice)
Why
Centralize?
- Single source of truth for schedules
- One DynamoDB table
- Easier auditing
- No drift between accounts
How
It Works
- Scheduler deployed in Shared Services
- Scheduler Lambda:
- Assumes
SchedulerExecutionRolein target accounts - SCPs restrict where start/stop is allowed
4. IAM Role Design (Critical)
Per-Account
Role
Each target
account has:Role: InstanceSchedulerExecutionRole
Trusted entity: Scheduler account
Permissions:
# json{"Effect":"Allow","Action":["ec2:StartInstances","ec2:StopInstances","rds:StartDBInstance","rds:StopDBInstance"],"Resource":"*"}
# NB:
- SCPs still apply after IAM allows.
5. SCP Strategy (The
Safety Net)
5.1 Block
Scheduling in Production OU
Goal:
Prevent any automated stop/start in Prod.
Sample
SCP (Prod
OU)
# json{"Version":"2012-10-17","Statement":[{"Sid":"twtechDenyStopStartInProd","Effect":"Deny","Action":["ec2:StopInstances","ec2:StartInstances","rds:StopDBInstance","rds:StartDBInstance"],"Resource":"*"}]}
# Result:
- Even if tagged
- Even if IAM allows
- Even if Scheduler tries
NB:
- Prod stays always-on
6. Tag-Based SCPs (Advanced
& Powerful)
- Allow
Scheduling ONLY When Explicitly Tagged
- Allow Scheduling ONLY When Explicitly Tagged
Goal:
- Prevent accidental scheduling.
SCP Example
(Dev / QA)
# json{"Version":"2012-10-17","Statement":[{"Sid":"twtechDenyStopWithoutScheduleTag","Effect":"Deny","Action":"ec2:StopInstances","Resource":"*","Condition":{"Null":{"aws:ResourceTag/Schedule":"true"}}}]}
# This enforces:
- Opt-in scheduling
- No tag = no stop
7. Protect Critical Instances (Break-Glass)
“Do Not
Schedule” Pattern
- Tag critical resources:
Scheduling=Disabled
# SCP:
# json{"Sid":"twtechProtectCriticalInstances","Effect":"Deny","Action":["ec2:StopInstances","rds:StopDBInstance"],"Resource":"*","Condition":{"StringEquals":{"aws:ResourceTag/Scheduling":"Disabled"}}}
# Even in Dev:
- Bastion hosts
- CI runners
- Shared services
Stay online.
8. OU-Based Scheduling Strategy (Recommended)
|
|
|
|
|
|
|
|
|
|
|
|
This aligns:
- Cost savings
- Risk tolerance
- Business criticality
9. Guarding Against Tag Abuse
Problem
- Developer tags prod instance:
Schedule=office-hours
Solution
- SCP denies stop/start in Prod
- AWS Config rule:
- Detect Schedule tag in Prod
- Alert security / platform team
10. Auditing & Visibility
CloudTrail
Track:
- Who tagged resources
- Who attempted stop/start
- SCP-denied actions
Cost Explorer
Track:
- Savings by OU
- Before/after scheduler rollout
11. Failure Scenarios (Real World)
|
|
|
|
|
|
|
|
|
|
12. Terraform / IaC Governance Pattern
- Even if Scheduler is CloudFormation:
- SCPs → Terraform
- OU structure → Terraform
- IAM roles → Terraform
NB:
- Scheduler becomes: A controlled exception to IaC
13. Reference Policy Stack (Layered Defense)
NB:
- Deny always wins.
14. When This Pattern Shines (when to use)
✅ Large AWS Organizations
✅ Multi-team environments
✅ FinOps-driven orgs
✅ Regulated environments
✅ Shared dev platforms
15. When NOT to Use This Pattern
❌ Single-account setups
❌ Small startups
❌ Teams without tag discipline
❌ Highly dynamic ephemeral infra
16. Take-Home
- Instance Scheduler saves money.
- Organizations define boundaries.
- SCPs prevent disasters.
Together, they form:
- A cost-optimized
- Governed
- Enterprise-grade scheduling platform
No comments:
Post a Comment