Sunday, September 7, 2025

AWS Lake Formation | Overview.

AWS Lake Formation - Overview.

Scope:

  • Intro,
  • The Concept:  AWS Lake Formation,
  • Core Components,
  • Key Features,
  • Architecture View,
  • When to Use Lake Formation.

Intro:

    • AWS Lake Formation is a service that makes it easy to set up, secure, and manage a data lake in days
    • AWS Lake Formation simplifies the process of creating a secure data lake by centralizing permissions management and providing a fine-grained, relational database-style access control model for data stored in Amazon S3 and other data sources.

 The Concept:  AWS Lake Formation

AWS Lake Formation is a managed service that makes it easier to build, secure, and manage data lakes. 

It automates tasks like:

    • Data ingestion from multiple sources (S3, databases, on-premises).
    • Data cataloging with metadata in the AWS Glue Data Catalog.
    • Data security using fine-grained access controls.
    • Data preparation with built-in transformations.

It essentially provides one central governance and security layer for twtech data lake on Amazon S3.

 Core Components

  1. Data Lake Storage (Amazon S3)
    • twtech raw and curated data lives in S3 buckets.
    • Lake Formation organizes data into databases and tables in the Glue Data Catalog.
  2. AWS Glue Data Catalog Integration
    • Lake Formation uses the Glue Data Catalog to store schema/metadata.
    • It enforces security policies directly on the catalog objects.
  3. Centralized Security & Permissions
    • Fine-grained access control at the table, column, and row levels.
    • Integrated with IAM, Amazon Athena, Redshift Spectrum, EMR, QuickSight, etc.
  4. Data Ingestion & Blueprinting
    • Built-in blueprints to bring in data from:
      • RDS, Aurora, DynamoDB, or other databases.
      • Log files and flat files in S3.
    • Automates ETL workflows via AWS Glue.
  5. Data Sharing
    • Share governed datasets across AWS accounts and organizations without copying data.

 Key Features

Centralized Policy Enforcement
    • Policies defined in Lake Formation are automatically applied across integrated services (Athena, Redshift, EMR, QuickSight).
Row & Column-Level Security
    • Example: A financial analyst may see only Region = US data and only specific columns (e.g., revenue but not PII).
Transaction Support
    • Uses ACID transactions for data lake operations, enabling concurrent read/write.
ML-Powered Data Profiling & Classification
    • Lake Formation can automatically detect PII, formats, and apply tags.

 Architecture View

    1. Data Sources RDS, DynamoDB, S3, streaming (Kinesis/Kafka).
    2. Ingestion via Lake Formation Blueprints / Glue Data is cleaned, transformed.
    3. Storage in Amazon S3 Raw, curated, transformed zones.
    4. Metadata in Glue Data Catalog Unified schema registry.
    5. Security & Governance in Lake Formation Central policy control.
    6. Consumption via Analytics Services Athena, Redshift Spectrum, EMR, SageMaker, QuickSight.

When to Use Lake Formation

  • Build a governed data lake from multiple data sources.
  • Enforce centralized, fine-grained access control across analytics tools.
  • Share data across accounts without duplication.
  • Simplify ingestion & cataloging with automation.


No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, Insights. Intro: Amazon EventBridg...