Think - with -Tech: AWS Lake Formation | Overview.

Sunday, September 7, 2025

AWS Lake Formation | Overview.

AWS Lake Formation - Overview.

Scope:

Intro,
The Concept: AWS Lake Formation,
Core Components,
Key Features,
Architecture View,
When to Use Lake Formation.

Intro:

AWS Lake Formation is a service that makes it easy to set up, secure, and manage a data lake in days.
AWS Lake Formation simplifies the process of creating a secure data lake by centralizing permissions management and providing a fine-grained, relational database-style access control model for data stored in Amazon S3 and other data sources.

The Concept: AWS Lake Formation

AWS Lake Formation is a managed service that makes it easier to build, secure, and manage data lakes.
It automates tasks like:

Data ingestion from multiple sources (S3, databases, on-premises).
Data cataloging with metadata in the AWS Glue Data Catalog.
Data security using fine-grained access controls.
Data preparation with built-in transformations.

It essentially provides one central governance and security layer for twtech data lake on Amazon S3.

Core Components

Data Lake Storage (Amazon S3)

twtech raw and curated data lives in S3 buckets.
Lake Formation organizes data into databases and tables in the Glue Data Catalog.

AWS Glue Data Catalog Integration

Lake Formation uses the Glue Data Catalog to store schema/metadata.
It enforces security policies directly on the catalog objects.

Centralized Security & Permissions

Fine-grained access control at the table, column, and row levels.
Integrated with IAM, Amazon Athena, Redshift Spectrum, EMR, QuickSight, etc.

Data Ingestion & Blueprinting

Built-in blueprints to bring in data from:

RDS, Aurora, DynamoDB, or other databases.
Log files and flat files in S3.

Automates ETL workflows via AWS Glue.

Data Sharing

Share governed datasets across AWS accounts and organizations without copying data.

Key Features

Centralized Policy Enforcement

Policies defined in Lake Formation are automatically applied across integrated services (Athena, Redshift, EMR, QuickSight).

Row & Column-Level Security

Example: A financial analyst may see only Region = US data and only specific columns (e.g., revenue but not PII).

Transaction Support

Uses ACID transactions for data lake operations, enabling concurrent read/write.

ML-Powered Data Profiling & Classification

Lake Formation can automatically detect PII, formats, and apply tags.

Architecture View

Data Sources → RDS, DynamoDB, S3, streaming (Kinesis/Kafka).
Ingestion via Lake Formation Blueprints / Glue → Data is cleaned, transformed.
Storage in Amazon S3 → Raw, curated, transformed zones.
Metadata in Glue Data Catalog → Unified schema registry.
Security & Governance in Lake Formation → Central policy control.
Consumption via Analytics Services → Athena, Redshift Spectrum, EMR, SageMaker, QuickSight.

When to Use Lake Formation

Build a governed data lake from multiple data sources.
Enforce centralized, fine-grained access control across analytics tools.
Share data across accounts without duplication.
Simplify ingestion & cataloging with automation.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)