A deep dive into
AWS DMS
(Database Migration Service) Sources & Targets.
Scope:
- Supported engines,
- Architectural behavior,
- Operational notes,
- Replication modes,
- Performance considerations,
- Real-world patterns.
Breakdown:
- Core Architecture Refresher,
- Supported AWS DMS Source
Endpoints,
- Supported AWS DMS Target Endpoints,
- Source → Target Compatibility
Matrix (Summary),
- DMS Migration Modes and
Source Behavior,
- Important Considerations Per
Source,
- Common Migration Patterns,
- Performance & Tuning,
- Monitoring & Troubleshooting,
- Best Practices Summary.
Intro:
AWS Database
Migration Service (AWS DMS) is designed to migrate databases quickly and
securely into
AWS with minimal downtime.
AWS Database
Migration Service supports homogeneous,
heterogeneous, on-premises, cloud,
and hybrid
database environments.
1. Core Architecture Refresher
An AWS
DMS migration involves:
1. Source Endpoint
- A live database or data store.
- AWS DMS fetches:
- Full load (bulk copy)
- CDC (Change Data Capture) stream
2. DMS Replication Instance
- Runs the migration engine.
- Performs:
- Data extraction
- Transformation (lightweight)
- Data loading
- CDC buffering & commit
3. Target Endpoint
- Destination database or data store.
- Receives data via bulk load or continuous replication.
2. Supported
AWS DMS Source Endpoints
- AWS DMS supports a wide range of relational, NoSQL, streaming, and analytics sources.
- Below is a detailed breakdown:
2.1 Relational Database Sources
Amazon RDS
-
Aurora MySQL / PostgreSQL
- RDS MySQL, PostgreSQL
- RDS MariaDB
- Oracle on RDS
- SQL Server on RDS
🔹 Supports CDC and full-load
🔹 Ideal for cloud-native migrations or
cross-region replication
On-Premises Relational Databases
- Oracle (versions 10g → 19c)
- Microsoft SQL Server (2005 → 2019)
- MySQL, PostgreSQL
- IBM Db2 LUW
- SAP ASE / Sybase
- Informix
🔹 Uses log-based CDC (redo logs, binlog, WAL,
transaction logs)
🔹 Can require additional permissions or
supplemental logging enabled
2.2 NoSQL Sources
MongoDB (Self-managed & Atlas)
- Supports full load + CDC
- Uses oplog for CDC
- Requires replica set or sharded cluster with proper oplog sizing
Amazon DynamoDB (when used as a source)
- Full load only
- DMS does not support CDC from DynamoDB (CDC is available via Streams→Kinesis).
2.3 Data Warehouse / Analytical Sources
Amazon Redshift
- Supported as source for unloading data
- Uses UNLOAD from Redshift to S3 under the hood
Greenplum, Netezza, Vertica
- Full load only
- Primarily used for BI/Analytics offloading or warehouse migration
2.4 Other Sources
S3 (CSV, Parquet, etc.)
- Full load only
- DMS reads objects from S3 as source tables
Kafka / Kinesis
- Acts as source when ingesting event streams
- DMS consumes messages and loads into targets
3. Supported
AWS DMS Target Endpoints
- AWS DMS supports more targets than sources. Targets include OLTP databases, data lakes, event streams, search systems, and BI tools.
3.1 Relational Database Targets
Amazon RDS (all engines)
- MySQL, PostgreSQL, MariaDB
- Oracle, SQL Server
- Aurora MySQL/PostgreSQL
🔹 Supports full load + CDC
🔹 Common use case: lift-and-shift to RDS
On-Premises Relational Databases
- Oracle
- SQL Server
- PostgreSQL
- MySQL
- Db2 LUW
🔹 Full load + CDC depending on DB type
🔹 Used for hybrid architectures, multi-site
DR, or repatriation
3.2 NoSQL & Document Stores
Amazon DynamoDB
- Full load + CDC supported
- Creates tables, handles partition keys, and adaptive throughput features
Amazon DocumentDB
- Full load only
MongoDB
- Full load + CDC as target
3.3 Data Lake & Object Storage Targets
Amazon S3
- CSV
- Parquet
- JSON
- Apache Iceberg (via DMS 3.5+)
🔹 Most commonly used target
🔹 CDC events can be batched into S3 folder
partitions
Amazon Redshift
- Target via Redshift COPY from S3
- Supports:
- Full load
- CDC for warehouse streaming (micro-batches)
3.4 Event & Streaming Targets
Amazon Kinesis Data Streams
- CDC events streamed in JSON format
- Supports near-real-time ingestion pipelines
Kafka (Amazon MSK or self-managed)
- Targets support:
- Avro
- JSON
- Debezium-like CDC format
🔹 Commonly used for event-driven
architectures
3.5 Search & Analytics Targets
Amazon OpenSearch Service
- Full load + CDC
- Great for search, indexing, analytics
- DMS automatically maps relational schemas to JSON docs
4. Source
→ Target Compatibility Matrix (Summary)
|
Source Type |
Full Load |
CDC |
Common Targets |
|
Oracle |
✔ |
✔ |
RDS Oracle, Aurora PG, Redshift, S3 |
|
SQL Server |
✔ |
✔ |
RDS SQL Server, MySQL, S3, Kafka |
|
MySQL |
✔ |
✔ |
Aurora, RDS, S3 |
|
PostgreSQL |
✔ |
✔ |
Aurora, RDS, Redshift |
|
MongoDB |
✔ |
✔ |
DocumentDB, DynamoDB, S3 |
|
DynamoDB |
✔ |
❌ |
S3, Redshift |
|
Redshift |
✔ |
❌ |
S3, RDS engines |
|
S3 |
✔ |
❌ |
RDS, Redshift, DynamoDB |
|
Kafka |
✔ |
✔ |
S3, RDS, DynamoDB |
5. DMS
Migration Modes and Source Behavior
AWS DMS supports 3 migration modes:
5.1 Full Load Only
Used when:
- Database is offline or small enough to migrate quickly
- No CDC logs available
- Target is a data lake or analytics DB
5.2 CDC Only
Used when:
- Schema/data already pre-loaded
- Active/active replication
- Change streaming to Kafka/Kinesis
Relies on:
- Oracle redo logs
- SQL Server transaction logs
- MySQL binlog
- PostgreSQL WAL
- MongoDB oplog
5.3 Full Load + CDC
Most production workloads Ensures:
- Minimal downtime
- Live synchronization
- Cutover with little to no impact
6. Important
Considerations Per Source
6.1 Oracle
- Requires supplemental logging
- Best CDC performance with ASM storage
- Supports LOB migration modes:
- Full
- Inline
- Limited size
6.2 SQL Server
- Requires:
- CDC enabled on DB + tables
-
SQLSERVERAGENTrunning
6.3 MySQL / MariaDB
- Requires binlog in ROW format
6.4 PostgreSQL
- Requires logical replication slots
6.5 MongoDB
- Requires high oplog retention
- Sharded clusters support CDC
7. Common
Migration Patterns
Pattern 1 — On-Prem Oracle → Amazon Aurora PostgreSQL
- Use DMS + SCT (Schema Conversion Tool)
- Full load + CDC
- Change datatype mappings
- Cutover with minutes of downtime
Pattern 2 — SQL Server → S3 Data Lake
- Scale-out extraction
- Parallel table loading
- Convert to Parquet via DMS 3.5
Pattern 3 — MySQL → Kafka / MSK
- Low-latency event streaming
- Use JSON schema format
- Downstream microservices consume CDC updates
Pattern 4 — MongoDB → DynamoDB
- Common for NoSQL modernization
- DMS handles key mapping
- Good for large collections with sub-second oplog propagation
8.
Performance & Tuning
Increase Replication Instance class
- More vCPU = faster CDC
- RAM improves table caching
Use Parallel Load for Full Load
- Up to 50 tables in parallel
- Special "Batch Apply" mode for S3
Ensure Disk Throughput Supports CDC
- Use GP3 with proper IOPS provisioning
Optimize LOB settings
- “Limited LOB” significantly increases speed
- Treat large objects separately
Avoid network bottlenecks
- Use Amazon Direct Connect for on-prem sources
9.
Monitoring & Troubleshooting
Key metrics:
- CDCLatencySource (lag on source logs)
- CDCLatencyTarget (lag applying changes)
- FreeableMemory (throttles full load if too low)
- DiskQueueDepth (indicates storage saturation)
- CPUUtilization
Key logs:
- Task logs
- Replication instance logs
- Source/target endpoint logs
10. Best
Practices Summary
Do:
- Enable proper logging (WAL/binlog/redo)
- Size replication instance based on workload
- Test CDC retention window
- Use AWS SCT for heterogeneous migrations
- Use Multi-AZ if required for fault tolerance
Avoid:
- Migrating huge LOBs without planning
- Under-provisioned storage
- Using S3 as source for CDC (not supported)
No comments:
Post a Comment