Think - with -Tech: AWS DynamoDB | Backups For Disaster Recovery(DR).

Wednesday, August 13, 2025

AWS DynamoDB | Backups For Disaster Recovery(DR).

A deep dive on DynamoDB Backups for Disaster Recovery (DR).

View:

Architecture,

Recovery Point Objectives (RPO),

Recovery Time Objectives (RTO),

Costs,

Operational considerations as a DevOps/DevSecOps/Cloud/DR Engineer.

1. DynamoDB Backup Types

DynamoDB supports two main backup & restore mechanisms, each with different DR characteristics.

Feature	Point-in-Time Recovery (PITR)	On-Demand Backup
Purpose	Continuous protection for accidental deletes/writes	Compliance, archival, cloning tables
Granularity	Any second in the last 35 days	Snapshot at request time
Retention	Rolling 35 days	Indefinite (until deleted)
RPO	~1 second	As of backup creation
RTO	Minutes to hours (table size dependent)	Minutes to hours
Cost	Charged for storage of change logs	Charged for full snapshot storage
Best For	Operational recovery from logical errors	Long-term DR, migrations, compliance retention

2. Point-in-Time Recovery (PITR) Deep Dive

PITR uses DynamoDB Streams–like change logs behind the scenes to enable recovery to any second in the past 35 days.

Enablement: Table-by-table (not global by default).
Use Cases:

Accidental DELETE or PUT overwrites.
Bad batch job/data corruption.

How the Restore process Works:

1. Choose a timestamp within the last 35 days.

2. AWS creates a new table with that point’s data.

3. Swap traffic over once validated.

DR Characteristics:

RPO: ~1 second (near real-time).
RTO: Depends on table size & data transfer speed to the new table.

3. On-Demand Backups

A complete snapshot stored in DynamoDB’s backend storage layer.

Use Cases:

Regulatory requirements.
Monthly/quarterly archival.
Pre-deployment “safety net.”

Behavior:

Backups run without impacting read/write performance.
Restore is always to a new table.
Can be cross-account & cross-region (more on this below).

Cost Considerations:

Backup storage is separate from table storage.
Restores incur full data transfer charges internally.

4. Cross-Region & Cross-Account DR

If twtech DR plan includes region failure scenarios, PITR alone isn’t enough — PITR stays in-region.
To protect against regional outages:

Option A – Backup Copy

Process:

Create on-demand backup in Region A.
Use CopyBackupToRegion API to move to Region B.

Pros: No data loss from region outage if last copy is recent.
Cons: Increased RPO (depends on backup frequency).

Option B – Global Tables

Process:

Set up DynamoDB Global Tables for active-active replication.

Pros: RPO ~0; no restore needed for failover.
Cons: More expensive; not strictly “backup,” but a replication strategy.

5. Disaster Recovery Patterns

DR Strategy	RPO	RTO	Cost	Notes
In-Region PITR	Seconds	Hours	Low-Med	Covers logical corruption; not region outage
Scheduled On-Demand + Cross-Region Copy	Hours (based on schedule)	Hours	Med-High	Protects from region outage
Global Tables	~0	Minutes	High	Failover without restore
Hybrid	Seconds–Hours	Minutes–Hours	High	PITR + periodic region copy

6. Operational Considerations

Automation:

Use EventBridge to trigger periodic on-demand backups.
Use Lambda to copy to DR region.

Monitoring:

BackupCompleted CloudWatch events.
PITR enabled status alarms.

Testing:

Periodically restore backups to a staging environment.
Validate data integrity and application compatibility.

Security:

Encrypt backups with KMS CMKs.
Ensure IAM least privilege for backup/restore APIs.

Large Table Restores:

Parallel partition restore architecture means restore speed increases with provisioned capacity.
Restores are bulk load operations, not live streaming.

7. Example DR Flow

Scenario: Regional outage in us-east-2
Goal: Recover in us-west-2 with < 4h RTO and < 1h RPO.

Every hour, an on-demand backup is created in us-east-2.
Immediately copied to us-east-1.
During outage:

Trigger restore from latest backup in us-east-1 to a new table.
Repoint app to new endpoint after functional tests.

8. Costs to Keep in Mind

PITR: ~$0.20 per GB-month for change logs.
On-Demand: ~$0.10 per GB-month.
Cross-Region Copy: Additional storage + transfer.
Restores: ~$0.15 per GB restored.

DynamoDB Disaster Recovery Strategy Matrix — built so you can plug in your target RPO (Recovery Point Objective) and RTO (Recovery Time Objective) and immediately see the right AWS backup features, automation setup, and trade-offs.

DynamoDB DR Strategy Matrix

Target RPO	Target RTO	Recommended AWS Features	Automation Setup	Cost Level	Pros	Cons	Best For
≤ 1 second	≤ 15 min	Global Tables	- Create active-active Global Table across regions. - Route 53 failover routing. - Health checks for auto-switch.	🔴 High	Zero restore time; instant failover; protects from region outage.	High ongoing cost; complex conflict resolution logic.	Mission-critical, 24/7 low-latency apps.
≤ 1 second	Hours	PITR (Point-in-Time Recovery) (same region)	- Enable PITR per table. - Manual/automated restore to new table. - Use CloudFormation/Lambda to rebuild indexes/infra.	🟢 Low-Med	Covers accidental deletes/writes; granular restore.	No region outage protection; restore time depends on table size.	In-region logical corruption recovery.
≤ 1 hour	≤ 4 hours	PITR + Hourly On-Demand Backup + Cross-Region Copy	- PITR for in-region safety. - EventBridge hourly backup trigger. - Lambda copies backup to DR region.	🟡 Medium	Combines near-real-time local restore + regional DR.	Higher storage costs; hourly RPO may not meet sub-hour needs.	Balanced cost & DR coverage.
≤ 1 hour	> 4 hours	On-Demand Backups + Cross-Region Copy (every 1h)	- EventBridge to schedule backups. - Lambda to copy to DR region.	🟡 Medium	Simple automation; meets compliance needs.	Slower restore; 1h data loss possible.	DR compliance & cost balance.
≤ 24 hours	> 4 hours	Daily On-Demand Backups (Cross-Region if needed)	- Daily scheduled backups. - Optional DR region copy.	🟢 Low	Cheapest; meets regulatory archiving.	High potential data loss; slow recovery.	Non-critical, audit-focused data.
Custom	Custom	Hybrid (PITR + periodic backups + optional global tables)	- Mix features based on business unit criticality.	Variable	Tailored to workload.	More complex ops.	Multi-tier DR planning.

Automation Architecture Reference

For “PITR + Cross-Region On-Demand” setup:

PITR Enabled → Continuous local protection.
EventBridge Schedule (every X hours):

Trigger Lambda → CreateBackup API.
Lambda → CopyBackupToRegion API.

Backup Monitoring:

CloudWatch Event on BackupCompleted → SNS alerts.

Disaster Event:

Restore from latest DR region backup → New table.
Switch endpoints via Route 53 or config update.

Quick Selection Guide

If twtech wants zero downtime, zero data loss → Global Tables.
If twtech wants minimal cost & protect from bad writes → PITR only.
If twtech wants balance of cost, RPO ~1h, region protection → PITR + hourly cross-region backups.
If compliance is primary goal → Daily On-Demand backups with cross-region copy.

Here’s twtech Sample DynamoDB DR Decision Tree that starts with RPO (Recovery Point Objective) and RTO (Recovery Time Objective) , then routes to the right AWS backup/replication setup automatically.

DynamoDB DR Decision Tree

START: twtech target RPO & RTO

│

├── RPO ≤ 1 second?

│ │

│ ├── RTO ≤ 15 min → Use GLOBAL TABLES

│ │ - Multi-region, active-active.

│ │ - Route 53 failover.

│ │ - Cost: High.

│ │

│ └── RTO > 15 min → Use PITR (Point-in-Time Recovery)

│ - In-region restore.

│ - No regional outage protection.

│ - Cost: Low-Med.

│

├── RPO ≤ 1 hour?

│ │

│ ├── RTO ≤ 4 hours → Use PITR + HOURLY ON-DEMAND BACKUPS + CROSS-REGION COPY

│ │ - Combines near-real-time local recovery + DR region protection.

│ │ - Cost: Medium.

│ │

│ └── RTO > 4 hours → Use HOURLY ON-DEMAND BACKUPS + CROSS-REGION COPY

│ - Cost: Medium.

│ - Simpler, but longer restore time.

│

└── RPO ≤ 24 hours?

│

├── RTO ≤ 4 hours → Use DAILY ON-DEMAND BACKUPS + CROSS-REGION COPY

│ - Meets compliance + regional DR.

│ - Cost: Low.

│

└── RTO > 4 hours → DAILY ON-DEMAND BACKUPS (optional cross-region copy)

- Lowest cost.

- Longest RPO/RTO.

Color-coded Quick Legend

· 🔴 Global Tables → Max availability, zero data loss, high cost.

· 🟡 PITR + Cross-Region Backup → Balanced cost, sub-hour RPO, DR ready.

· 🟢 On-Demand Backups → Low cost, compliance-friendly, slower recovery.

Visual Diagram (Flow Style)

Insights: PITR

PITR stands for Point-in-Time Recovery,... a crucial data recovery technique primarily used with databases. Here's a breakdown:Restoring data to a specific past state: PITR allows administrators to restore or recover a database (or other systems) to the exact state it was in at a particular point in time in the past.
Protection against data loss or corruption: This is particularly valuable for situations like accidental deletion of data, unintended writes that corrupt the database, or software application rollouts that cause issues.

Summary of How Point-in-Time Recovery (PITR) works:
PITR typically relies on a combination of regular full backups and continuous logging of all changes made to the database (often referred to as transaction logs or write-ahead logs - WAL).
A full backup provides a snapshot of the database at a specific moment.
Transaction logs record every change made after the full backup.
To restore to a specific point in time, the system first restores the full backup and then applies the relevant changes from the transaction logs up to the desired point in time.

Think - with -Tech

Wednesday, August 13, 2025

AWS DynamoDB | Backups For Disaster Recovery(DR).

Color-coded Quick Legend

Visual Diagram (Flow Style)

No comments:

Post a Comment

AWS DynamoDB | Integration With S3 Bucket.

Blog Archive