Amazon DynamoDB - Deep Dive.
Scope:
- The concept: DynamoDB,
- Core Architecture,
- Data Model,
- Storage & Scaling,
- Consistency,
- Replication,
- Performance Model,
- Advanced Features,
- Best Practices,
- Real-World Use Cases,
- Sample – High-Traffic Production Architecture,
- Createing Amazon DynamoDB & Integrating with other AWS services (UI / CLI),
- Integrate DynamoDB with Other AWS Services,
- Sample policy for Lambda,
- When to Use DynamoDB & when not to use DynamoDB.
- Amazon DynamoDB is a fully managed, serverless NoSQL, key-value and document database service provided by Amazon Web Services (AWS).
- Amazon DynamoDB is designed for single-digit millisecond performance at any scale and offers built-in security, continuous backups, automated multi-Region replication, and in-memory caching.
1. The concept: DynamoDB
- Amazon DynamoDB is a fully managed, serverless NoSQL database service designed for high-performance applications at scale.
It provides:
- Single-digit millisecond latency at any scale.
- Automatic scaling of throughput & storage.
- Multi-Region replication for global applications.
- Event-driven integration (via DynamoDB Streams + Lambda).
- Pay-per-request or provisioned pricing models.
It’s based on Amazon’s internal
"Dynamo" paper (2007),
which powered Amazon.com’s shopping cart and inspired many modern distributed
databases.
2. Core Architecture
- Tables → Items → Attributes
- Items are schemaless (can vary by row).
- Each item is identified by a Primary Key:
- Partition Key (PK) → Determines item’s
placement.
- Partition Key + Sort Key
(Composite Key) →
Allows range queries.
- Data is partitioned across multiple servers.
- Partitioning is based on a hash of the partition key.
- Auto-scaling adjusts partitions when storage or
throughput grows.
- Supports eventual consistency (default) and strong
consistency (optional, per-request).
- Data is replicated 3x within a Region (across
AZs).
- Global Tables
replicate across Regions for multi-region apps.
3. Performance Model
- Single-digit millisecond response time.
- Performance depends on Read/Write Capacity Units (RCUs/WCUs):
- 1 RCU = 1 strongly consistent read/sec (up to 4KB
item).
- 1 WCU = 1 write/sec (up to 1KB item).
- On-Demand mode
- On-Demand mode (pay-per-request, no capacity planning).
- Provisioned mode (reserve throughput with optional auto-scaling).
- DAX (DynamoDB Accelerator) → In-memory caching for microsecond latency.
4. Advanced Features
- DynamoDB Streams → Change data capture (CDC) for real-time triggers (integrates with Lambda, Kinesis).
- TTL (Time-to-Live) → Auto-expire items.
- Transactions → ACID across multiple items & tables.
- Indexes
- Global Secondary Index (GSI): Alternate queryable key across partitions.
- Local Secondary Index (LSI): Query on different sort key (must share same partition key).
- Global Tables
- Global Tables → Multi-region active-active replication.
- Point-in-time recovery (PITR) → Continuous backups (35 days).
- Fine-grained security with IAM + encryption at rest.
5. Best Practices
- Choose a good partition key to avoid “hot partitions.”
- Example: Instead of userId, use userId#timestamp to spread writes.
- Use GSIs for flexible querying
- Use GSIs for flexible querying, but design them carefully (indexes cost capacity).
- Enable on-demand mode if workload is unpredictable.
- Leverage Streams + Lambda for event-driven microservices.
- Use DAX if twtech needs sub-millisecond response times.
- Avoid large item sizes (limit: 400KB per item).
- BatchWriteItem / BatchGetItem to optimize throughput.
6. Real-World Use Cases
- E-commerce
→ Shopping carts, product catalogs, inventory.
- Gaming → Player session state, leaderboards.
- IoT → Device telemetry ingestion at scale.
- Finance → Fraud detection, transaction logs.
- Social Media → User profiles, activity feeds.
- Serverless apps → Backend for Lambda + API Gateway.
7. Sample – High-Traffic Production Architecture
Imagine a global ride-sharing app:
- DynamoDB (Global Tables) → Stores user profiles, trips, and real-time driver
availability.
- DAX → Accelerates location queries (sub-ms).
- Streams + Lambda → Update trip status, trigger billing workflows.
- API Gateway + Lambda → Expose REST API to mobile apps.
- Kinesis Firehose + S3 → Archive trip history for analytics.
- CloudWatch + Auto-scaling → Manage costs & performance.
This architecture ensures:
- Low-latency lookups
(who’s the nearest driver?).
- Global consistency (multi-region replication).
- Event-driven workflows (trip completion triggers invoice generation).
8, Createing Amazon DynamoDB & Integrating with other AWS services
NB:
- It is a common requirement in cloud applications for building scalable, serverless, or event-driven architectures.
Here’s twtech detailed
step-by-step guide:
1.
Create a DynamoDB Table
- Via AWS Management Console
- Go to DynamoDB in the AWS
console.
- Click Create table.
- Provide:
- Table name:e.g., twtech-Users)
- Primary key:Partition key (PK) and optional sort key (SK))
- Optionally configure:
- Read/write capacity (Provisioned or On-demand)
- Encryption
- TTL (Time-to-Live) for automatic record deletion
- Click Create table.
- Via AWS CLI
# bash
aws dynamodb
create-table \
--table-name twtech-Users \
--attribute-definitions AttributeName=twtech-UserID,AttributeType=S \
--key-schema AttributeName=twtech-UserID,KeyType=HASH \
--provisioned-throughput
ReadCapacityUnits=5,WriteCapacityUnits=5
- Via AWS SDK (Python Example with boto3)
import boto3
dynamodb =
boto3.resource('dynamodb')
table =
dynamodb.create_table(
TableName='twtech-Users',
KeySchema=[
{'AttributeName': 'UserID', 'KeyType': 'HASH'}
],
AttributeDefinitions=[
{'AttributeName': 'UserID', 'AttributeType':
'S'}
],
BillingMode='PAY_PER_REQUEST'
)
table.wait_until_exists()
print("twtech Table created successfully")
2.
Integrate DynamoDB with Other AWS Services
A.
Lambda (Serverless Function)
- Use Lambda to read/write data in
DynamoDB.
- Example: Trigger a Lambda function when an API Gateway request comes in.
Steps:
- Create a Lambda function in the AWS
console.
- Attach IAM permissions allowing dynamodb:GetItem, PutItem, Scan, etc.
- Use SDK in twtech Lambda code:
import boto3
dynamodb =
boto3.resource('dynamodb')
table = dynamodb.Table('twtech-Users')
def lambda_handler(event,
context):
response = table.put_item(
Item={
'UserID': 'twtech123',
'Name': 'twtechuser-pat'
}
)
return response
B. API
Gateway
- Create REST or HTTP APIs that interact
with DynamoDB through Lambda.
- Use Lambda Proxy Integration:
- API Gateway receives HTTP request.
- Triggers Lambda function.
- Lambda performs DynamoDB operation and
returns response.
C. S3 (Event-driven Data Ingestion)
- twtech can stream new S3 objects to DynamoDB
via Lambda:
- Configure S3 Event Notifications
to trigger a Lambda function.
- Lambda reads the file content and stores
relevant data in DynamoDB.
D.
DynamoDB Streams
- DynamoDB Streams capture data changes
(insert, modify, delete).
- twtech can integrate streams with:
- Lambda – to process changes in real-time.
- Kinesis Data Firehose – to deliver DynamoDB changes to S3,
Redshift, or Elasticsearch.
Sample with Lambda Trigger:
def lambda_handler(event,
context):
for record in event['Records']:
if record['eventName'] == 'INSERT':
new_data = record['dynamodb']['NewImage']
print("New record added:",
new_data)
E. Step
Functions
- Use AWS Step Functions for
orchestrating workflows with DynamoDB:
- Example: Validate order → Store in
DynamoDB → Send notification.
F.
CloudWatch
- Monitor DynamoDB metrics (Read/Write
capacity, Throttled requests).
- Use CloudWatch Alarms to trigger Lambda or SNS notifications on thresholds.
G.
EventBridge
- Stream DynamoDB Streams to EventBridge.
- Useful for decoupled event-driven architectures.
H. AWS
AppSync
- Integrate DynamoDB as a backend for GraphQL
APIs.
- AppSync handles queries, mutations, and subscriptions automatically.
I. Glue
/ Redshift / Athena
- For analytics:
- Use Glue ETL to extract DynamoDB
data.
- Query DynamoDB via Athena using DynamoDB
connector.
- Load data to Redshift for
advanced analytics.
3. IAM
Permissions
- Ensure the service accessing DynamoDB has
the right IAM role.
- Sample policy for Lambda:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:Scan",
"dynamodb:UpdateItem"
],
"Resource": "arn:aws:dynamodb:us-east-2:accountID:table/twtech-Users"
}
]
}
Key takeaway:
Create table → Write/Read with Lambda → Trigger via API Gateway, S3, or Streams → Orchestrate with Step Functions → Monitor with CloudWatch.
NB:- DynamoDB integrates seamlessly with:
- Eerverless,
- Event-driven,
- Analytics pipelines across AWS.
9. When
to Use DynamoDB & when not to use DynamoDB
When to Use DynamoDB:
- Need millisecond performance at scale.
- Unpredictable workloads (spiky traffic).
- Event-driven apps (serverless).
- Require global distribution.
When Not to use DynamoDB:
- Need complex queries (joins, aggregations → better in RDS or Redshift).
- Data fits strict relational models.
- Need large item sizes (>400KB).
No comments:
Post a Comment