Monday, July 14, 2025

AWS DataSync | Overview.

 

Here’s twtech overview of AWS DataSync, including its 2025 enhancements and core use cases:

The Concept: AWS DataSync.

AWS DataSync is a high-speed, secure, fully managed data transfer service for moving files and objects between on‑premises systems, edge storage, other cloud providers, and AWS storage services like Amazon S3, Amazon EFS, and Amazon FSx.

It automates and accelerates data migration, replication, archiving, and hybrid cloud workflows, while preserving metadata, supporting incremental transfer, scheduling, throttling, encryption, and data integrity validation.

 Noteworthy Updates in 2025

• Kerberos Authentication for SMB (Jan 28, 2025)

SMB source locations now support Kerberos authentication (in addition to NTLM) when connecting to self-managed SMB servers such as those using Active Directory Kerberos v5,

• Enhanced Mode Cross‑Cloud Transfers (May 29, 2025)

DataSync can now copy data between AWS and storage services in other clouds (like Google Cloud Storage, Azure Blob and Azure Files, Oracle OCI, Wasabi, Cloudflare R2, Backblaze, DigitalOcean Spaces, and more)—without deploying a DataSync agent. Enhanced mode provides higher throughput, scalability, and simplified setup.

• Agentless Cross‑Region Transfers (Jul 2024 but still relevant)

Agentless transfers between any AWS regions—including opt‑in regions—are now possible, enabling you to replicate or move data across AWS Storage services without deploying or managing agents.

• DataSync Availability in AWS Secret Region (May 8, 2025)

DataSync is now supported in the AWS Secret Region, allowing secure transfers between on‑prem systems and AWS Storage services in the secret cloud.

• End of DataSync Discovery (May 20, 2025)

The Discovery feature (agent-based storage scanning) was officially deprecated and is no longer available as of May 20, 2025.

 Core Use Cases & Benefits

  • Hybrid and Cloud Migration
    Migrate files/objects from on-prem NFS, SMB, HDFS, or object storage, and sync to Amazon S3, EFS, FSx, or AWS-managed storage layers with encryption and integrity validation.
  • Multicloud Data Pipelines
    Perform cross-cloud transfers between AWS and other cloud storage providers—no agent required—with enhanced mode’s performance and scalability.
  • Cross-Region Replication
    Copy data between AWS regions—including opt‑in regions—agent-free, useful for disaster recovery, backup, or data locality optimization..
  • Secure File Access
    Kerberos support for SMB ensures compatibility with enterprise AD environments, enhancing secure authentication flows.
  • Adoption in High Compliance Environments
    With availability in the Secret Region and support for secure network transfer controls, DataSync fits highly regulated workloads.

 Performance Insights

  • Users report that DataSync can utilize available network capacity effectively—achieving up to ~10 Gbps transfers when the infrastructure supports it.
  • For lighter or container-managed workloads, users suggest AWS CLI or rclone sync from within containers may offer more flexibility or lower costs for smaller use cases

 Quick Start Guide

  1. Define Source and Destination Locations
    • Choose SMB/NFS/HDFS/object/edge storage or other clouds as source.
    • Choose AWS storage targets like S3, EFS, or FSx as destination.
  2. Select Authentication & Mode
    • Use Kerberos or NTLM for SMB.
    • For cross-cloud or cross-region transfers, use Enhanced Mode or agentless options.
  3. Configure a Task
    • Set up filters (include/exclude), bandwidth throttling, scheduling, and metadata preservation.
  4. Run, Monitor & Validate
    • Use CloudWatch metrics, task reports (JSON logs), and CloudTrail audit logs to monitor and verify transfer integrity.
  5. Post-Transfer Automation
    • Trigger Lambda, SNS, or AWS Glue/Athena workflows based on task outcomes or stored reports.

Final thoughts:

AWS DataSync is a secure, high-speed, automated solution for moving data across on‑premises, AWS, and multicloud environments.

Key 2025 enhancements include Kerberos support for SMB, agentless cross-cloud transfers, expanded cross-region replication, and support in the AWS Secret Region. The deprecated Discovery feature is no longer available as of May 20, 2025.

Useful link for Documentation:  https://docs.aws.amazon.com/datasync/latest/userguide/what-is-datasync.html?utm_source=chatgpt.com

Data Can synchronize to: 

• All types of Amazon S3 (storage classes – including Glacier).

• Amazon EFS.

• Amazon FSx (Windows, Lustre, NetApp, OpenZFS).

• With File permissions and metadata preserved (NFS POSIX, SMB) synchronize of data subsequently becomes easier.

• One agent task can use 10 Gbps & bandwidth limit can be configured.

No comments:

Post a Comment

Kubernetes Clusters | Upstream Vs Downstream.

  The terms "upstream" and "downstream" in the context of Kubernetes clusters often refer to the direction of code fl...