Wednesday, August 13, 2025

AWS DynamoDB | Backups For Disaster Recovery(DR).

 

A deep dive on DynamoDB Backups for Disaster Recovery (DR).

View:

Architecture,

Recovery Point Objectives (RPO), 

Recovery Time Objectives (RTO), 

Costs, 

Operational considerations as a DevOps/DevSecOps/Cloud/DR Engineer.

1. DynamoDB Backup Types

DynamoDB supports two main backup & restore mechanisms, each with different DR characteristics.

Feature

Point-in-Time Recovery (PITR)

On-Demand Backup

Purpose

Continuous protection for accidental deletes/writes

Compliance, archival, cloning tables

Granularity

Any second in the last 35 days

Snapshot at request time

Retention

Rolling 35 days

Indefinite (until deleted)

RPO

~1 second

As of backup creation

RTO

Minutes to hours (table size dependent)

Minutes to hours

Cost

Charged for storage of change logs

Charged for full snapshot storage

Best For

Operational recovery from logical errors

Long-term DR, migrations, compliance retention

2. Point-in-Time Recovery (PITR) Deep Dive

PITR uses DynamoDB Streams–like change logs behind the scenes to enable recovery to any second in the past 35 days.

  • Enablement: Table-by-table (not global by default).
  • Use Cases:
    • Accidental DELETE or PUT overwrites.
    • Bad batch job/data corruption.
  • How the Restore process Works:

1.               Choose a timestamp within the last 35 days.

2.               AWS creates a new table with that point’s data.

3.               Swap traffic over once validated.

  • DR Characteristics:
    • RPO: ~1 second (near real-time).
    • RTO: Depends on table size & data transfer speed to the new table.

3. On-Demand Backups

A complete snapshot stored in DynamoDB’s backend storage layer.

  • Use Cases:
    • Regulatory requirements.
    • Monthly/quarterly archival.
    • Pre-deployment “safety net.”
  • Behavior:
    • Backups run without impacting read/write performance.
    • Restore is always to a new table.
    • Can be cross-account & cross-region (more on this below).
  • Cost Considerations:
    • Backup storage is separate from table storage.
    • Restores incur full data transfer charges internally.

4. Cross-Region & Cross-Account DR

If twtech DR plan includes region failure scenarios, PITR alone isn’t enough — PITR stays in-region.
To protect against regional outages:

Option A – Backup Copy

  • Process:
    • Create on-demand backup in Region A.
    • Use CopyBackupToRegion API to move to Region B.
  • Pros: No data loss from region outage if last copy is recent.
  • Cons: Increased RPO (depends on backup frequency).

Option B – Global Tables

  • Process:
    • Set up DynamoDB Global Tables for active-active replication.
  • Pros: RPO ~0; no restore needed for failover.
  • Cons: More expensive; not strictly “backup,” but a replication strategy.

5. Disaster Recovery Patterns

DR Strategy

RPO

RTO

Cost

Notes

In-Region PITR

Seconds

Hours

Low-Med

Covers logical corruption; not region outage

Scheduled On-Demand + Cross-Region Copy

Hours (based on schedule)

Hours

Med-High

Protects from region outage

Global Tables

~0

Minutes

High

Failover without restore

Hybrid

Seconds–Hours

Minutes–Hours

High

PITR + periodic region copy

6. Operational Considerations

  • Automation:
    • Use EventBridge to trigger periodic on-demand backups.
    • Use Lambda to copy to DR region.
  • Monitoring:
    • BackupCompleted CloudWatch events.
    • PITR enabled status alarms.
  • Testing:
    • Periodically restore backups to a staging environment.
    • Validate data integrity and application compatibility.
  • Security:
    • Encrypt backups with KMS CMKs.
    • Ensure IAM least privilege for backup/restore APIs.
  • Large Table Restores:
    • Parallel partition restore architecture means restore speed increases with provisioned capacity.
    • Restores are bulk load operations, not live streaming.

7. Example DR Flow

Scenario: Regional outage in us-east-2
Goal: Recover in us-west-2 with < 4h RTO and < 1h RPO.

  1. Every hour, an on-demand backup is created in us-east-2.
  2. Immediately copied to us-east-1.
  3. During outage:
    • Trigger restore from latest backup in us-east-1 to a new table.
    • Repoint app to new endpoint after functional tests.

8. Costs to Keep in Mind

  • PITR: ~$0.20 per GB-month for change logs.
  • On-Demand: ~$0.10 per GB-month.
  • Cross-Region Copy: Additional storage + transfer.
  • Restores: ~$0.15 per GB restored.

DynamoDB Disaster Recovery Strategy Matrix — built so you can plug in your target RPO (Recovery Point Objective) and RTO (Recovery Time Objective) and immediately see the right AWS backup features, automation setup, and trade-offs.

DynamoDB DR Strategy Matrix

Target RPO

Target RTO

Recommended AWS Features

Automation Setup

Cost Level

Pros

Cons

Best For

≤ 1 second

≤ 15 min

Global Tables

- Create active-active Global Table across regions.
- Route 53 failover routing.
- Health checks for auto-switch.

🔴 High

Zero restore time; instant failover; protects from region outage.

High ongoing cost; complex conflict resolution logic.

Mission-critical, 24/7 low-latency apps.

≤ 1 second

Hours

PITR (Point-in-Time Recovery) (same region)

- Enable PITR per table.
- Manual/automated restore to new table.
- Use CloudFormation/Lambda to rebuild indexes/infra.

🟢 Low-Med

Covers accidental deletes/writes; granular restore.

No region outage protection; restore time depends on table size.

In-region logical corruption recovery.

≤ 1 hour

≤ 4 hours

PITR + Hourly On-Demand Backup + Cross-Region Copy

- PITR for in-region safety.
- EventBridge hourly backup trigger.
- Lambda copies backup to DR region.

🟡 Medium

Combines near-real-time local restore + regional DR.

Higher storage costs; hourly RPO may not meet sub-hour needs.

Balanced cost & DR coverage.

≤ 1 hour

> 4 hours

On-Demand Backups + Cross-Region Copy (every 1h)

- EventBridge to schedule backups.
- Lambda to copy to DR region.

🟡 Medium

Simple automation; meets compliance needs.

Slower restore; 1h data loss possible.

DR compliance & cost balance.

≤ 24 hours

> 4 hours

Daily On-Demand Backups (Cross-Region if needed)

- Daily scheduled backups.
- Optional DR region copy.

🟢 Low

Cheapest; meets regulatory archiving.

High potential data loss; slow recovery.

Non-critical, audit-focused data.

Custom

Custom

Hybrid (PITR + periodic backups + optional global tables)

- Mix features based on business unit criticality.

Variable

Tailored to workload.

More complex ops.

Multi-tier DR planning.



Automation Architecture Reference

For “PITR + Cross-Region On-Demand” setup:

  1. PITR Enabled → Continuous local protection.
  2. EventBridge Schedule (every X hours):
    • Trigger Lambda → CreateBackup API.
    • Lambda → CopyBackupToRegion API.
  3. Backup Monitoring:
    • CloudWatch Event on BackupCompleted → SNS alerts.
  4. Disaster Event:
    • Restore from latest DR region backup → New table.
    • Switch endpoints via Route 53 or config update.

Quick Selection Guide

  • If twtech wants zero downtime, zero data lossGlobal Tables.
  • If twtech wants minimal cost & protect from bad writesPITR only.
  • If twtech wants balance of cost, RPO ~1h, region protectionPITR + hourly cross-region backups.
  • If compliance is primary goalDaily On-Demand backups with cross-region copy.

Here’s twtech Sample DynamoDB DR Decision Tree that starts with RPO (Recovery Point Objective) and RTO (Recovery Time Objective) , then routes to the right AWS backup/replication setup automatically.

DynamoDB DR Decision Tree

START:  twtech target RPO & RTO

          

           ├── RPO ≤ 1 second?

           │       │

           │       ├── RTO ≤ 15 min → Use GLOBAL TABLES

           │       │       - Multi-region, active-active.

           │       │       - Route 53 failover.

           │       │       - Cost: High.

           │       │

           │       └── RTO > 15 min → Use PITR (Point-in-Time Recovery)

           │               - In-region restore.

           │               - No regional outage protection.

           │               - Cost: Low-Med.

           │

           ├── RPO ≤ 1 hour?

           │       │

           │       ├── RTO ≤ 4 hours → Use PITR + HOURLY ON-DEMAND BACKUPS + CROSS-REGION COPY

           │       │       - Combines near-real-time local recovery + DR region protection.

           │       │       - Cost: Medium.

           │       │

           │       └── RTO > 4 hours → Use HOURLY ON-DEMAND BACKUPS + CROSS-REGION COPY

           │               - Cost: Medium.

           │               - Simpler, but longer restore time.

           │

           └── RPO ≤ 24 hours?

                   │

                   ├── RTO ≤ 4 hours → Use DAILY ON-DEMAND BACKUPS + CROSS-REGION COPY

                   │       - Meets compliance + regional DR.

                   │       - Cost: Low.

                   │

                   └── RTO > 4 hours → DAILY ON-DEMAND BACKUPS (optional cross-region copy)

                           - Lowest cost.

                           - Longest RPO/RTO.

Color-coded Quick Legend

·        🔴 Global Tables → Max availability, zero data loss, high cost.

·        🟡 PITR + Cross-Region Backup → Balanced cost, sub-hour RPO, DR ready.

·        🟢 On-Demand Backups → Low cost, compliance-friendly, slower recovery.

Visual Diagram (Flow Style)

Insights: PITR

PITR stands for Point-in-Time Recovery,... a crucial data recovery technique primarily used with databases. Here's a breakdown:Restoring data to a specific past state: PITR allows administrators to restore or recover a database (or other systems) to the exact state it was in at a particular point in time in the past.
Protection against data loss or corruption: This is particularly valuable for situations like accidental deletion of data, unintended writes that corrupt the database, or software application rollouts that cause issues.
  • Summary of How Point-in-Time Recovery (PITR) works: 
  • PITR typically relies on a combination of regular full backups and continuous logging of all changes made to the database (often referred to as transaction logs or write-ahead logs - WAL).
  • A full backup provides a snapshot of the database at a specific moment.
  • Transaction logs record every change made after the full backup.
  • To restore to a specific point in time, the system first restores the full backup and then applies the relevant changes from the transaction logs up to the desired point in time.

No comments:

Post a Comment

AWS DynamoDB | Integration With S3 Bucket.

  AWS DynamoDB ↔ S3 integration , View: What DynamoDB ↔ S3 integration is,   How to use DynamoDB ↔ S3 integration,   Why uses DynamoDB ↔  S3...