An Overview of AWS Schema
Conversion Tool (SCT).
Scope:
- Tailored for DevOps/Cloud/DevSecOps/SRE/Platform focus Engineers,
- Architecture,
- Workflows,
- Edge
cases,
- Performance
optimizations,
- Operational
best practices.
Breakdown:
- SCT – Internal Architecture Overview,
- SCT Workflow – End-to-End,
- SCT Conversion Internals,
- SCT Agent,
- Failure Modes,
- Resiliency & Operational Best Practices,
- Top SCT Optimization Tips (Production Grade),
- SCT Cheat Sheet – Quick Reference.
Intro:
- AWS Schema Conversion Tool (SCT) helps transform database schemas, objects, SQL, and sometimes ETL code from one engine to another.
- AWS Schema Conversion Tool (SCT) helps transform most commonly commercial → open-source/AWS-native migrations such as:
|
Source |
Target |
|
Oracle |
Aurora
PostgreSQL / RDS PostgreSQL |
|
SQL
Server |
Aurora
MySQL/PostgreSQL |
|
Teradata |
Amazon
Redshift |
|
Netezza |
Amazon
Redshift |
|
On-prem
ETL |
AWS Glue |
|
Oracle
PL/SQL packages |
PostgreSQL
PL/pgSQL functions |
NB:
- Schema Conversion Tool (SCT) performs automated
conversion where possible and flags manual
actions via detailed reports.
1. SCT – Internal Architecture Overview
A. Logical Components:
1.
Metadata Collector
- Introspects source DB
- Reads tables, columns, views, constraints, sequences, stored
procedures, triggers, user-defined types
- Builds an internal intermediate representation (IR)
2.
Conversion Engine
- Pattern-matching engine
- PL/SQL → PL/pgSQL converters
- SQL dialect transformers
- Procedural logic rewriter
- Function rewriting (ex:
Oracle VARCHAR2 → TEXT/VARCHAR)
- Dependency graph resolution
- Unsupported feature annotator
3.
Assessment Engine
- Generates “complexity scores”
- Flags incompatible constructs
- Produces “estimated manual effort” hours
- Creates detailed assessment reports (HTML/PDF)
4.
Data Extractors (optional)
- For homogeneous or heterogeneous migrations
- Direct integration with AWS DMS for data movement
- Supports schema copy, seed data, and slow-changing data extraction
5.
ETL Conversion Module (Optional)
- Converts Informatica, SSIS, Oracle ODI, DataStage into Glue
scripts
- Creates PySpark/Glue ETL jobs
6.
SCT Agent (for large
workloads)
- Runs conversions at scale
- Offloads extraction from the UI client
- Supports parallel extract
- Runs in VPC or on-prem
- EBS-backed storage for extraction
2. SCT Workflow – End-to-End
1. Environment Setup
- Install SCT client (Mac/Windows/Linux)
- Install SCT Agent (optional)
- Configure DB connectivity (on-prem
→ VPN/Direct Connect)
2. Run Schema Assessment
Generates:
- DB feature parity report
- Complexity score
- Incompatibility lists
- Workload breakdown (automated
vs manual conversion)
3. Perform Schema Conversion
- Auto-create converted schema
- Manually fix complex code
- Validate dependency resolution
- Export SQL and apply to target
4. Apply Converted Schema to Target
- Push converted DDL to:
- Aurora PG/MySQL
- RDS PostgreSQL/MySQL
- Redshift
5. Connect SCT to AWS DMS
- DMS migrates actual data
- SCT validates schema compatibility for DMS tasks
- Post-migration: SCT may run SQL-based data validation scripts
6. Post-Conversion Verification
- Object comparison
- SQL validation
- Application integration tests
3. SCT Conversion Internals
SQL Dialect Translation
Examples:
|
Oracle |
PostgreSQL |
|
|
|
|
|
|
|
|
recursive
CTE |
|
|
|
|
|
|
NB:
Triggers, sequences, synonyms, and specific PL/SQL
constructs often require manual rewriting.
B. Stored Procedures & Functions
SCT rewrites:
- Variable declarations
- Cursor operations
- Exception handling
- Package structures
- LOOP and EXIT logic
- FORALL → batch INSERT
- BULK COLLECT → arrays
- Oracle collections → PostgreSQL
arrays
- DBMS_OUTPUT → RAISE NOTICE
Limitations:
- Oracle packages don’t map 1:1 to PostgreSQL packages
- Autonomous transactions → require
redesign
- Oracle REF CURSOR types → require
rewrite
C. Schema Differences / Object Mappings
SCT handles:
- Primary Key differences
- Foreign key cascades
- Constraint naming issues
- Collation/encoding differences
- Partitioning (Oracle
“Range/Hash” → PostgreSQL native partitioning)
D. Performance Optimization Conversions
- Oracle hints → removed or replaced
- Oracle
indexes → mapped to PostgreSQL/B-tree
- Oracle
function-based indexes → often require
materialized columns
- Oracle
partitioning → PG native or sharding
- Temp table
behavior differences handled explicitly
4. SCT Agent
Purpose
- Used when the dataset is >1TB, or when migrating from on-prem
databases that are slow to extract.
Capabilities
- Parallel schema extract
- Parallel data unload
- WAN-friendly extraction (throttling,
resume, compression)
- Writes data in chunked Parquet files
- Transfers to S3 bucket
- Integrates with AWS DMS (full
load)
Flow
Source DB → SCT Agent → (Parquet Extracts) → S3 → DMS → Target DBAgent Tuning Tips
- Increase worker threads based on CPU
- Tune fetch size for Oracle
- Prefer dedicated network path (DX/VPN)
- Use ephemeral SSD to speed up extract
- Do not place agent behind multiple NAT layers
5. Failure Modes
1. Unsupported DB Features
Common problematic objects:
- Autonomous transactions
- Custom PL/SQL packages
- Complex views with analytic functions
- Materialized views with refresh logic
- Oracle XMLTYPE / JSON handling
- Oracle Spatial and Graph
- Oracle AQ / Streams
- SQL Server CLR objects
- Advanced partitioning templates
2. Dependency Errors
Occurs when
objects reference unconverted items.
Fix:
- Dependency graph validation
- Reorder or manually script the DDL
3. Large Object Failures (LOB/BLOB/CLOB)
Causes:
- Incompatible data types
- Incorrect character set mappings
- Corrupt blobs
- Timeout during extract
4. Network/Connectivity & Agent Failures
- SCT Agent pauses mid-transfer
- SCT UI freezes (common on
Windows)
- Extract stalls on specific table due to locking
5. DMS + SCT Integration Failures
- Schema mismatch
- Missing transformations
- Unsupported key types
- Large tables requiring chunking
6. Resiliency & Operational Best Practices
A. Before Conversion
- Enable binary logging on target
- Enable row-based replication if using PostgreSQL
- Pre-size target storage with growth expectations
- Validate source DB for corruption
- Ensure source DB stats are updated (Oracle ANALYZE/DBMS_STATS)
B. During Conversion
- Use SCT Agent for datasets >1TB
- Break schema migration into slices (per schema, per module)
- Exclude unused legacy schema objects
- Pin SCT to a dedicated VM for stability
- Increase SCT JVM heap memory for large conversions (command-line flag)
C. During Data Migration (DMS
Phase)
Recommended:
- Full load + CDC strategy
- Parallel table loading
- Disable foreign keys during load
- Disable heavy-index tables during load
- Enable retries with exponential backoff
D. After Conversion
- Run SCT data validation queries
- Compare row counts
- Compare checksums
- Test stored procedures using unit tests
- Run application read/write tests
- Validate concurrency patterns (especially
Oracle → PostgreSQL)
7. Top SCT Optimization Tips (Production
Grade)
Increase
SCT Performance
- Run SCT on a Linux workstation (Windows is slower)
- Use SCT Agent for large extractions
- Avoid network latency (run
closer to DB)
- Increase conversion memory (JVM
flags:
-Xms4G -Xmx16G)
Reduce
Manual Conversion Work
- Prioritize code-based objects (functions/packages)
first
- Use automated pre-check rules
- Convert materialized views into regular views temporarily
- Pre-rewrite Oracle CONNECT BY to recursive CTEs
- Migrate sequences carefully and map ownership properly
Improve
Migration Accuracy
- Freeze source schema before extraction
- Always version-control converted SQL
- Run automated regression tests on PL/pgSQL code
- Validate numeric precision and date/time semantics
8. SCT Cheat Sheet – Quick Reference
Supported Conversion Targets
- Aurora PostgreSQL / MySQL
- RDS PostgreSQL / MySQL
- Amazon Redshift
- AWS Glue (ETL)
Best Use Cases
- Commercial DB → Open-source
- Oracle to PostgreSQL (most
common)
- SQL Server to MySQL/PostgreSQL
- Netezza/Teradata → Redshift
Not a Good Fit When
- Heavy Oracle PL/SQL packages with advanced features
- Complex proprietary DB extensions
- OLTP systems relying on vendor-specific features
- Legacy applications using Oracle AQ, Spatial, InterMedia, etc.
No comments:
Post a Comment