A deep into Amazon DynamoDB Global
Tables.
Think of this as an “architect’s +
operator’s”
View:
- How they work,
- Why they exist,
- Subtle pitfalls to avoid.
- Mitigation.
- Recovery.
- Monitory & Obsevability.
1.
Concept Overview
A DynamoDB Global Table is an active-active,
fully managed, multi-region database table that automatically replicates data across AWS Regions.
- Active-Active: Reads/writes can happen in any replica
region.
- Fully Managed: AWS handles replication, conflict
detection, and consistency guarantees.
- Multi-Master: Every replica is a full DynamoDB table
that can accept writes.
Use Cases:
- Low-latency access for global users.
- Disaster recovery (DR) & multi-region
resilience.
- Compliance with data residency regulations.
- Multi-region serverless / microservices
backends.
2. How
It Works
A Global Table is essentially N
regional DynamoDB tables (one in
each participating region) linked together via DynamoDB Streams.
Replication Flow:
- Write in Region A → DynamoDB stores it in the local table.
- Stream Event → The change is captured by DynamoDB
Streams.
- Cross-Region Replication → The stream processor pushes the change
to Region B, C, etc.
- Apply Update → The remote regions apply the update to
their replicas.
Latency: Usually under 1 second for replication, but
not guaranteed real-time (depends on
region distances and load).
3.
Versions: v1 vs v2
- v1 (2017): Required you to create identical tables
in each region manually and then enable global replication.
- v2 (2019): Much simpler — create the table in one
region and add replicas from the console/CLI.
- v2 supports on-demand capacity, PITR,
TTL, encryption, and auto-scaling across regions.
If twtech is starting today August.2025
→ it would Use v2.
4.
Consistency Model
- Within a single region → twtech gets the same options:
- Eventually Consistent Reads (default)
- Strongly Consistent Reads (only
within the same region)
- Across regions → Always eventually consistent (you can’t have strong consistency
cross-region because of network physics).
5.
Conflict Resolution
- Rule: Last writer wins (based
on a per-item LastUpdatedTimestamp that AWS maintains internally, not your
attribute).
- Implication:
- Simultaneous writes to the same item in
different regions → whichever arrives later (in terms of timestamp)
overwrites the other.
- No merge logic — if you need
conflict-aware merging, you have to implement it in your application.
6.
Capacity & Billing
- Per-region capacity: Each replica has its own read/write
capacity (R/W units or on-demand).
- Charges in every region:
- Storage per region.
- Read/write requests per region.
- Cross-region replication traffic (DynamoDB Streams charges +
inter-region data transfer).
7.
Operational Considerations
A. Region
Addition & Removal
- Adding a new region → DynamoDB backfills
all existing items into it.
- Removing a region → No impact on others,
but you must delete it from the Global Table config.
B. Backups &
PITR
- Each replica manages its own backups and point-in-time
recovery — PITR is per-region.
C. TTL
- Time-to-live deletes replicate to all
regions — so expiring an item in one region removes it globally.
D. Streams
- Each replica has its own stream; if twtech-app consumes changes, choose where to process from.
8.
Performance Tuning
- Hot Partitions: Still a concern — hot keys can cause
throttling in every region.
- Write Sharding: If you have heavy writes on few keys,
distribute them to avoid single-partition bottlenecks.
- Batch Writes: Avoid single massive writes that hit the
replication pipeline all at once.
9.
Security
- Encryption: Always encrypted at rest using AWS KMS (can use customer-managed CMKs).
- IAM Controls: Each replica can have different IAM
policies if needed.
- Inter-Region Traffic: Handled securely via AWS backbone (no public internet).
10.
Limitations & Gotchas (watch out for)
- Conflict resolution is simplistic → twtech must handle application-level
merge logic if needed.
- Cost multiplies by region count → A 3-region setup costs ~3× a
single-region table.
- No cross-region transactions → Transactions are region-scoped.
- Strong consistency is local-only → twtech can’t get strong reads from
another region.
- Schema changes → twtech must ensure changes are
compatible across all replicas.
11.
Best Practices
- Design for Conflict Minimization
- Region-aware partition keys (e.g., user_id#region).
- Keep Items Small
- Large items make replication slower.
- Use On-Demand in Low-Traffic Regions
- Keeps costs predictable.
- Monitor with CloudWatch Metrics
- Look at ReplicationLatency and UserErrors.
- Test Failover Scenarios
- Simulate a region outage and verify app
behavior.
12.
Example: CLI Creation
# bash
aws dynamodb
create-table \
--table-name twtech-GameScores \
--attribute-definitions AttributeName=twtech-GameId,AttributeType=S \
--key-schema AttributeName=twteh-GameId,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES
\
--region us-east-2
aws dynamodb
update-table \
--table-name twtech-GameScores \
--replica-updates '[{"Create":
{"RegionName": "us-west-2"}}]' \
--region us-east-1
DynamoDB Global Tables failure modes.
- What actually happens when things go sideways,
- Wow they heal,
- How to design ...so that twtech doesn’t lose data (or sleep).
1) What if a region goes offline
A. Single-region outage (power/network/az-constrained)
·
Other regions: stay fully read/write active.
·
The failed region: its table/replicas
are unavailable to your app;
writes there fail fast or on timeout depending on the failure.
·
Replication behavior: writes in
healthy regions are queued durably by DynamoDB
for the offline region. You don’t manage the backlog; DynamoDB does.
·
Conflict risk:
unchanged—because the offline region isn’t accepting writes, no new
multi-master conflicts arise from that region during the outage.
B. Network partition between regions (split-brain risk)
·
All regions still accept writes.
·
Conflicts can occur if the same item key
is written in two regions during the partition.
·
Resolution when connectivity returns is last writer wins (LWW) based on DynamoDB’s per-item
system timestamp. Earlier concurrent updates can be overwritten.
C. Partial service impairment (e.g., throttling, elevated errors)
·
The affected region may be “flappy”: some writes succeed, some
fail, replication lags increase.
·
You can (and should) brown-out the region
by routing traffic away before it turns into a data quality problem.
2) When and how does it catch up
A. Region recovers
·
DynamoDB automatically replays
the backlog of changes into the recovered region.
·
Ordering: per-item
order is preserved; cross-item ordering is not guaranteed
(normal DynamoDB behavior).
·
At-least-once apply: updates can be
observed more than once downstream (important for
stream consumers), but table state converges correctly.
·
Secondary indexes: GSIs/LSIs
are updated as the backlog applies; expect index catch-up lag before queries look fully consistent in that region.
B. What you see operationally
·
Replication latency spikes in the
healthy regions (outgoing) and the recovered region (incoming) until the
backlog drains.
·
Write throughput surges on the
recovered region (applying backfill). If you use provisioned capacity, this can
throttle catch-up.
3) Data correctness: the sharp edges
A. Last-Writer-Wins (LWW) implications
·
Lost increments problem: counters
updated in two regions may “lose” one side’s increments.
·
Field-level merges don’t exist: the whole
item version wins; per-attribute merges are on you.
Mitigations
·
Single-writer per key (“home
region” pattern): route writes for a given item/user/tenant to one region; others
are read-mostly. Flip the home region only during DR.
·
Sharded counters / CRDT-ish
approach: maintain per-region counters and sum them when reading; periodic
compaction if needed.
·
App-level vector/version
attributes: store a version
or (region, logical_ts)
and
implement idempotent upserts + merge
logic for complex items.
B. Transactions
·
DynamoDB transactions are region-scoped. When
replicated, they arrive as independent item updates—no cross-region atomicity.
Design tip: keep
transactional boundaries within a single partition key in a single home region
whenever possible.
C. Conditional writes
·
Conditions evaluate locally. Under
partition, two regions can each pass their local condition and later one side’s
write will be overwritten by LWW.
Design tip: combine
conditions with the home region rule, or
encode a write-fence attribute
you control during failover (see patterns below).
4) Traffic management during failover
Detect
·
Health checks on your read/write
paths (not just EC2/LB).
·
A canary that does a
small conditional write + read per region to validate end-to-end.
Shift
·
Use Route 53 failover/latency routing or your GSLB to drain writes from the unhealthy region.
·
Keep reads local where safe; promote nearby region reads if
latency is acceptable.
Guard
·
Circuit breaker in your data
access layer: when error budget blows, stop writing to the suspect region.
·
Idempotency keys (request tokens) for writes so retries
across regions don’t create duplicates.
5) Capacity & cost during recovery
·
Catch-up can be write-heavy. Prefer On-Demand in low/variable traffic regions; or temporarily pre-scale provisioned WCU/RCU before you flip traffic back.
·
Watch for throttling in the recovering
region—it elongates replication lag and extends the DR window.
6) Monitoring & alarming (CloudWatch)
·
ReplicationLatency
(per replica): alert on sustained
elevation.
·
ThrottledRequests / SystemErrors: both on
tables and GSIs.
·
SuccessfulRequestLatency Pxx: spikes
indicate brown-outs.
·
Incoming/OutgoingReplicationBytes: confirms
backlog drain progress.
·
ConsumedWriteCapacityUnits on the
recovering region: expect a ramp.
7) Patterns that work in production
Pattern A — Home Region per key (recommended default)
·
Partition users/tenants by geography or hash → store home_region
on the item.
·
Writes only in home_region
. Reads anywhere.
·
On DR promote: update a single
control record (tenant -> new_home_region
) and route new writes there.
·
Optional: a write-fence attribute (e.g., write_epoch
) that you bump when moving the home; all writers must include ConditionExpression write_epoch =
:current
.
Pattern B — Merge-able state (CRDT-light)
·
For counters/sets/logs that must be multi-master:
o
Counters: per-region count_<region>
attributes;
reads sum them.
o
Sets: maintain adds_<region>
/removes_<region>
tombstone
sets; reads compute effective set.
o
Periodically compact to a canonical form from a single region.
Pattern C — Idempotent upserts with request tokens
·
Each write carries request_id
. Item stores
last_request_id
and last_applied_ts
.
·
Writers use a conditional: “apply if request_id
is new OR
same as last_request_id`”.
·
Safe against retries and at-least-once replication.
Pattern D — Write-through queue for side effects
·
If you fan out to SQS/Lambda/ES after DynamoDB, consume Streams with exactly-once semantics at the
consumer (dedupe on eventID
or request_id
), because
after recovery you may see reordered/duplicate events.
8) Operational runbook (copy/paste)
1.
Detect: health
checks fail for region R; ReplicationLatency
rising.
2.
Decide: 5–10 min SLO
breach → brown-out writes to R
(circuit breaker flips).
3.
Reroute: Route 53 →
failover writes to nearest healthy region(s). Ensure idempotency tokens are live.
4.
Scale: If
provisioned, pre-scale WCU on healthy regions by +X% to absorb extra load.
5.
Communicate: mark R as read-only (best-effort) or remove from read pool if
latency spikes.
6.
Observe: Watch
backlog metrics; confirm GSIs catching up in healthy regions.
7.
Recover: When R
returns, monitor backlog drain; don’t immediately flip writes back—wait for
replication latency to normalize.
8.
Repatriate: Re-enable
writes to R (or keep the new home if you changed it); scale back capacity;
post-mortem on conflicts that occurred during the window.
9) Edge cases & FAQs
·
What about TTL deletes? TTL
expirations replicate as deletes. If a region was offline at expiration time,
the delete will apply on catch-up.
·
Do I lose the backlog if outage is
long? DynamoDB Global Tables manages backlog durability for
replication; twtech doesn’t need to build its own spool. twtech Streams consumers
still only have 24h
retention—plan their recovery separately.
·
Can I get strong reads across
regions during DR? No—cross-region is eventually consistent only. If you require
strong reads, route to the current writer/home region.
·
Will GSI queries look weird during
recovery? They can, temporarily. Build in read‐time retries or a “data may be stale” banner for
the recovering region.
10) Quick design checklist
·
Choose home-region per key or prove you need true multi-master + merges.
·
Put idempotency
tokens on all mutating requests.
·
Add a write-fence/epoch to simplify DR flips.
·
Separate immutable from mutable fields to
minimize conflict scope.
·
Prefer On-Demand (esp. satellite regions) for bursty catch-up.
·
Instrument replication & throttling alarms;
rehearse the runbook quarterly.
· Make stream consumers idempotent (not change unless something new happens) and tolerant of reordering.
No comments:
Post a Comment