Sunday, November 23, 2025

AWS DMS (Database Migration Service) Sources & Targets | Deep Dive.

A deep dive into  AWS DMS (Database Migration Service) Sources & Targets.

Scope:

  •        Supported engines,
  •        Architectural behavior,
  •        Operational notes,
  •        Replication modes,
  •        Performance considerations,
  •        Real-world patterns.

Breakdown:

  •        Core Architecture Refresher,
  •        Supported AWS DMS Source Endpoints,
  •        Supported AWS DMS Target Endpoints,
  •        Source Target Compatibility Matrix (Summary),
  •        DMS Migration Modes and Source Behavior,
  •        Important Considerations Per Source,
  •        Common Migration Patterns,
  •        Performance & Tuning,
  •        Monitoring & Troubleshooting,
  •        Best Practices Summary.

Intro:

AWS Database Migration Service (AWS DMS) is designed to migrate databases quickly and securely into AWS with minimal downtime.

AWS Database Migration Service supports homogeneous, heterogeneous, on-premises, cloud, and hybrid database environments.

1. Core Architecture Refresher

An AWS DMS migration involves:

1. Source Endpoint

  •         A live database or data store.
  •         AWS DMS fetches:
    •    Full load (bulk copy)
    •    CDC (Change Data Capture) stream

2. DMS Replication Instance

  •         Runs the migration engine.
  •         Performs:
    •    Data extraction
    •    Transformation (lightweight)
    •    Data loading
    •    CDC buffering & commit

3. Target Endpoint

  •         Destination database or data store.
  •         Receives data via bulk load or continuous replication.

2.  Supported AWS DMS Source Endpoints

  • AWS DMS supports a wide range of relational, NoSQL, streaming, and analytics sources.
  • Below is a detailed breakdown:

2.1 Relational Database Sources

 Amazon RDS

  •         Aurora MySQL / PostgreSQL
  •         RDS MySQL, PostgreSQL
  •         RDS MariaDB
  •         Oracle on RDS
  •         SQL Server on RDS

🔹 Supports CDC and full-load
🔹 Ideal for cloud-native migrations or cross-region replication

 On-Premises Relational Databases

  •         Oracle (versions 10g → 19c)
  •         Microsoft SQL Server (2005 → 2019)
  •         MySQL, PostgreSQL
  •         IBM Db2 LUW
  •         SAP ASE / Sybase
  •         Informix

🔹 Uses log-based CDC (redo logs, binlog, WAL, transaction logs)
🔹 Can require additional permissions or supplemental logging enabled

2.2 NoSQL Sources

 MongoDB (Self-managed & Atlas)

  •         Supports full load + CDC
  •         Uses oplog for CDC
  •         Requires replica set or sharded cluster with proper oplog sizing

 Amazon DynamoDB (when used as a source)

  •         Full load only
    • DMS does not support CDC from DynamoDB (CDC is available via StreamsKinesis).

2.3 Data Warehouse / Analytical Sources

 Amazon Redshift

  •         Supported as source for unloading data
  •         Uses UNLOAD from Redshift to S3 under the hood

 Greenplum, Netezza, Vertica

  •         Full load only
  •         Primarily used for BI/Analytics offloading or warehouse migration

2.4 Other Sources

 S3 (CSV, Parquet, etc.)

  •         Full load only
  •         DMS reads objects from S3 as source tables

 Kafka / Kinesis

  •         Acts as source when ingesting event streams
  •         DMS consumes messages and loads into targets

3.  Supported AWS DMS Target Endpoints

  • AWS DMS supports more targets than sources. Targets include OLTP databases, data lakes, event streams, search systems, and BI tools.

3.1 Relational Database Targets

 Amazon RDS (all engines)

  •         MySQL, PostgreSQL, MariaDB
  •         Oracle, SQL Server
  •         Aurora MySQL/PostgreSQL

🔹 Supports full load + CDC
🔹 Common use case: lift-and-shift to RDS

 On-Premises Relational Databases

  •         Oracle
  •         SQL Server
  •         PostgreSQL
  •         MySQL
  •         Db2 LUW

🔹 Full load + CDC depending on DB type
🔹 Used for hybrid architectures, multi-site DR, or repatriation

3.2 NoSQL & Document Stores

 Amazon DynamoDB

  •         Full load + CDC supported
  •         Creates tables, handles partition keys, and adaptive throughput features

 Amazon DocumentDB

  •         Full load only

 MongoDB

  •         Full load + CDC as target

3.3 Data Lake & Object Storage Targets

 Amazon S3

  •         CSV
  •         Parquet
  •         JSON
  •        Apache Iceberg (via DMS 3.5+)

🔹 Most commonly used target
🔹 CDC events can be batched into S3 folder partitions

 Amazon Redshift

  •         Target via Redshift COPY from S3
  •         Supports:
    •    Full load
    •    CDC for warehouse streaming (micro-batches)

3.4 Event & Streaming Targets

 Amazon Kinesis Data Streams

  •         CDC events streamed in JSON format
  •         Supports near-real-time ingestion pipelines

 Kafka (Amazon MSK or self-managed)

  •         Targets support:
    •    Avro
    •    JSON
    •    Debezium-like CDC format

🔹 Commonly used for event-driven architectures

3.5 Search & Analytics Targets

 Amazon OpenSearch Service

  •         Full load + CDC
  •         Great for search, indexing, analytics
  •         DMS automatically maps relational schemas to JSON docs

4.  Source → Target Compatibility Matrix (Summary)

Source Type

Full Load

CDC

Common Targets

Oracle

RDS Oracle, Aurora PG, Redshift, S3

SQL Server

RDS SQL Server, MySQL, S3, Kafka

MySQL

Aurora, RDS, S3

PostgreSQL

Aurora, RDS, Redshift

MongoDB

DocumentDB, DynamoDB, S3

DynamoDB

S3, Redshift

Redshift

S3, RDS engines

S3

RDS, Redshift, DynamoDB

Kafka

S3, RDS, DynamoDB

5.  DMS Migration Modes and Source Behavior

AWS DMS supports 3 migration modes:

5.1 Full Load Only

Used when:

  •         Database is offline or small enough to migrate quickly
  •         No CDC logs available
  •         Target is a data lake or analytics DB

5.2 CDC Only

Used when:

  •         Schema/data already pre-loaded
  •         Active/active replication
  •         Change streaming to Kafka/Kinesis

Relies on:

  •         Oracle redo logs
  •         SQL Server transaction logs
  •         MySQL binlog
  •         PostgreSQL WAL
  •         MongoDB oplog

5.3 Full Load + CDC

Most production workloads Ensures:

  •         Minimal downtime
  •         Live synchronization
  •         Cutover with little to no impact

6.  Important Considerations Per Source

6.1 Oracle

  •         Requires supplemental logging
  •         Best CDC performance with ASM storage
  •         Supports LOB migration modes:
    •    Full
    •    Inline
    •    Limited size

6.2 SQL Server

  •         Requires:
    •    CDC enabled on DB + tables
    •    SQLSERVERAGENT running

6.3 MySQL / MariaDB

  •         Requires binlog in ROW format

6.4 PostgreSQL

  •         Requires logical replication slots

6.5 MongoDB

  •         Requires high oplog retention
  •         Sharded clusters support CDC

7.  Common Migration Patterns

Pattern 1 — On-Prem Oracle Amazon Aurora PostgreSQL

  •         Use DMS + SCT (Schema Conversion Tool)
  •         Full load + CDC
  •         Change datatype mappings
  •         Cutover with minutes of downtime

Pattern 2 — SQL Server → S3 Data Lake

  •         Scale-out extraction
  •         Parallel table loading
  •         Convert to Parquet via DMS 3.5

Pattern 3 — MySQL → Kafka / MSK

  •         Low-latency event streaming
  •         Use JSON schema format
  •         Downstream microservices consume CDC updates

Pattern 4 — MongoDB → DynamoDB

  •         Common for NoSQL modernization
  •         DMS handles key mapping
  •         Good for large collections with sub-second oplog propagation

8.  Performance & Tuning

 Increase Replication Instance class

  •         More vCPU = faster CDC
  •         RAM improves table caching

 Use Parallel Load for Full Load

  •         Up to 50 tables in parallel
  •         Special "Batch Apply" mode for S3

 Ensure Disk Throughput Supports CDC

  •         Use GP3 with proper IOPS provisioning

 Optimize LOB settings

  •         “Limited LOB” significantly increases speed
  •         Treat large objects separately

 Avoid network bottlenecks

  •         Use Amazon Direct Connect for on-prem sources

9.  Monitoring & Troubleshooting

Key metrics:

  •         CDCLatencySource (lag on source logs)
  •         CDCLatencyTarget (lag applying changes)
  •         FreeableMemory (throttles full load if too low)
  •         DiskQueueDepth (indicates storage saturation)
  •         CPUUtilization

Key logs:

  •         Task logs
  •         Replication instance logs
  •         Source/target endpoint logs

10.  Best Practices Summary

Do:

  •         Enable proper logging (WAL/binlog/redo)
  •         Size replication instance based on workload
  •         Test CDC retention window
  •         Use AWS SCT for heterogeneous migrations
  •         Use Multi-AZ if required for fault tolerance

Avoid:

  •         Migrating huge LOBs without planning
  •         Under-provisioned storage
  •         Using S3 as source for CDC (not supported)

No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...