DynamoDB Backups For Disaster Recovery (DR) - Overview.
Focus:
Tailored for DevOps/DevSecOps/Cloud/DR Engineers.
Scope:
- Intro,
- DynamoDB Backup Types,
- Point-in-Time Recovery (PITR) Deep Dive,
- On-Demand Backups,
- Cross-Region & Cross-Account DR,
- Options to protect against regional outages
- Disaster Recovery Patterns,
- Operational Considerations,
- Sample DR Flow,
- Costs to Keep in Mind,
- DynamoDB Disaster Recovery Strategy Matrix,
- Automation Architecture Reference,
- Sample DynamoDB DR Decision Tree,
- Visual Diagram (Flow Style),
- Insights PITR (Point-in-Time Recovery),
Intro:
- DynamoDB provides two primary backup methods for disaster recovery (DR):
- Point-in-Time Recovery (PITR),
- On-demand backups, which can be managed and automated using AWS Backup.
- These services help meet different recovery time objectives (RTO) and recovery point objectives (RPO)
1. DynamoDB Backup Types
- DynamoDB supports two main backup & restore mechanisms, each with different DR characteristics.
|
Feature |
Point-in-Time
Recovery (PITR) |
On-Demand
Backup |
|
Purpose |
Continuous protection for
accidental deletes/writes |
Compliance, archival, cloning
tables |
|
Granularity |
Any second in the last 35 days |
Snapshot at request time |
|
Retention |
Rolling 35 days |
Indefinite (until deleted) |
|
RPO |
~1 second |
As of backup creation |
|
RTO |
Minutes to hours (table size
dependent) |
Minutes to hours |
|
Cost |
Charged for storage of change logs |
Charged for full snapshot storage |
|
Best For |
Operational recovery from logical
errors |
Long-term DR, migrations,
compliance retention |
2. Point-in-Time Recovery (PITR) Deep Dive
PITR uses DynamoDB Streams–like
change logs behind the scenes to enable recovery to any second in the
past 35 days.
- Enablement:
Table-by-table (not global by
default).
- Use Cases:
- Accidental DELETE or PUT overwrites.
- Bad batch job/data corruption.
DR Characteristics:1. Choose a timestamp within the last 35 days.
2. AWS creates a new table with that point’s data.
3. Swap traffic over once validated.
- RPO:
~1 second (near real-time).
- RTO:
Depends on table size & data transfer speed to the new table.
3. On-Demand Backups
A complete snapshot stored in
DynamoDB’s backend storage layer.
- Use Cases:
- Regulatory requirements.
- Monthly/quarterly archival.
- Pre-deployment “safety net.”
- Behavior:
- Backups run without impacting read/write performance.
- Restore is always to a new table.
- Can be cross-account & cross-region (more on this
below).
- Cost Considerations:
- Backup storage is separate from table storage.
- Restores incur full data transfer charges
internally.
4. Cross-Region & Cross-Account DR
- If twtech DR plan includes region failure scenarios, PITR alone isn’t enough — PITR stays in-region.
Option
A – Backup Copy
- Process:
- Create on-demand backup in Region A.
- Use CopyBackupToRegion API to move to Region B.
- Pros:
No data loss from region outage if last copy is recent.
- Cons:
Increased RPO (depends on backup frequency).
Option
B – Global Tables
- Process:
- Set up DynamoDB Global Tables for active-active
replication.
- Pros:
RPO ~0; no restore needed for failover.
- Cons:
More expensive; not strictly “backup,” but a replication strategy.
5. Disaster Recovery Patterns
|
DR
Strategy |
RPO |
RTO |
Cost |
Notes |
|
In-Region PITR |
Seconds |
Hours |
Low-Med |
Covers logical corruption; not
region outage |
|
Scheduled On-Demand + Cross-Region
Copy |
Hours (based on schedule) |
Hours |
Med-High |
Protects from region outage |
|
Global Tables |
~0 |
Minutes |
High |
Failover without restore |
|
Hybrid |
Seconds–Hours |
Minutes–Hours |
High |
PITR + periodic region copy |
6. Operational Considerations
- Automation:
- Use EventBridge to trigger periodic on-demand
backups.
- Use Lambda to copy to DR region.
- Monitoring:
- BackupCompleted
CloudWatch events.
- PITR enabled status alarms.
- Testing:
- Periodically restore backups to a staging environment.
- Validate data integrity and application
compatibility.
- Security:
- Encrypt backups with KMS CMKs.
- Ensure IAM least privilege for backup/restore
APIs.
- Large Table Restores:
- Parallel partition restore architecture means restore
speed increases with provisioned capacity.
- Restores are bulk load operations, not live
streaming.
7. Sample DR Flow
Scenario: Regional outage in us-east-2
Goal: Recover in us-west-1 with < 4h RTO and < 1h RPO.
- Every hour, an on-demand backup is created in us-east-2.
- Immediately copies a replication to us-west-1.
- During outage:
- Trigger restore from latest backup in us-east-2 to a new table.
- Repoint app to new endpoint after functional tests.
8. Costs to Keep in Mind
- PITR:
~$0.20 per GB-month for change logs.
- On-Demand:
~$0.10 per GB-month.
- Cross-Region Copy:
Additional storage + transfer.
- Restores:
~$0.15 per GB restored.
DynamoDB Disaster Recovery Strategy Matrix
- Built so twtech can plug in its target RPO (Recovery Point Objective) and RTO (Recovery Time Objective) and immediately see the right AWS backup features, automation setup, and trade-offs.
DynamoDB DR Strategy Matrix
|
Target RPO |
Target RTO |
Recommended
AWS Features |
Automation
Setup |
Cost Level |
Pros |
Cons |
Best For |
|
≤ 1 second |
≤ 15 min |
Global Tables |
- Create active-active Global Table
across regions. |
🔴 High |
Zero restore time; instant failover;
protects from region outage. |
High ongoing cost; complex conflict
resolution logic. |
Mission-critical, 24/7 low-latency
apps. |
|
≤ 1 second |
Hours |
PITR (Point-in-Time Recovery) (same region) |
- Enable PITR per table. |
🟢 Low-Med |
Covers accidental deletes/writes;
granular restore. |
No region outage protection; restore
time depends on table size. |
In-region logical corruption
recovery. |
|
≤ 1 hour |
≤ 4 hours |
PITR + Hourly On-Demand Backup +
Cross-Region Copy |
- PITR for in-region safety. |
🟡 Medium |
Combines near-real-time local
restore + regional DR. |
Higher storage costs; hourly RPO may
not meet sub-hour needs. |
Balanced cost & DR coverage. |
|
≤ 1 hour |
> 4 hours |
On-Demand Backups + Cross-Region
Copy (every 1h) |
- EventBridge to schedule backups. |
🟡 Medium |
Simple automation; meets compliance
needs. |
Slower restore; 1h data loss
possible. |
DR compliance & cost balance. |
|
≤ 24 hours |
> 4 hours |
Daily On-Demand Backups
(Cross-Region if needed) |
- Daily scheduled backups. |
🟢 Low |
Cheapest; meets regulatory
archiving. |
High potential data loss; slow
recovery. |
Non-critical, audit-focused data. |
|
Custom |
Custom |
Hybrid (PITR + periodic backups + optional global
tables) |
- Mix features based on business
unit criticality. |
Variable |
Tailored to workload. |
More complex ops. |
Multi-tier DR planning. |
For “PITR + Cross-Region On-Demand”
setup:
- PITR Enabled
→ Continuous local protection.
- EventBridge Schedule
(every X hours):
- Trigger Lambda → CreateBackup API.
- Lambda → CopyBackupToRegion API.
- Backup Monitoring:
- CloudWatch Event on BackupCompleted → SNS alerts.
- Disaster Event:
- Restore from latest DR region backup → New table.
- Switch endpoints via Route 53 or config update.
Quick Selection Guide
- If twtech wants zero downtime, zero data loss → Global Tables.
- If twtech wants minimal cost & protect from bad
writes → PITR only.
- If twtech wants balance of cost, RPO ~1h, region
protection → PITR + hourly
cross-region backups.
- If compliance is primary goal → Daily On-Demand backups with cross-region copy.
Sample DynamoDB
DR Decision Tree that starts with RPO (Recovery
Point Objective) and RTO (Recovery Time Objective) , then routes
to the right AWS backup/replication setup automatically.
Color-coded Quick Legend
- 🔴 Global Tables → Max availability, zero data loss, high cost.
- 🟡 PITR + Cross-Region Backup → Balanced cost, sub-hour RPO, DR ready.
- 🟢 On-Demand Backups → Low cost, compliance-friendly, slower recovery.
Visual Diagram (Flow Style)
Insights PITR (Point-in-Time Recovery)
Recovery technique primarily used with databases.
No comments:
Post a Comment