AWS Lake Formation - Overview.
Scope:
- Intro,
- The Concept: AWS Lake Formation,
- Core Components,
- Key Features,
- Architecture View,
- When to Use Lake Formation.
Intro:
- AWS Lake Formation is a service that makes it easy to set up, secure, and manage a data lake in days.
- AWS Lake Formation simplifies the process of creating a secure data lake by centralizing permissions management and providing a fine-grained, relational database-style access control model for data stored in Amazon S3 and other data sources.
The Concept: AWS Lake Formation
AWS Lake Formation is a managed service that makes it easier to build, secure, and manage data lakes.
It automates tasks like:
- Data ingestion
from multiple sources (S3, databases, on-premises).
- Data cataloging with metadata in the AWS Glue Data Catalog.
- Data security using fine-grained access controls.
- Data preparation with built-in transformations.
It essentially provides one central governance and security layer for twtech data lake on Amazon S3.
Core Components
- Data Lake Storage (Amazon S3)
- twtech raw and curated data lives in S3 buckets.
- Lake Formation organizes data into databases and
tables in the Glue Data Catalog.
- AWS Glue Data Catalog Integration
- Lake Formation uses the Glue Data Catalog to store
schema/metadata.
- It enforces security policies directly on the catalog
objects.
- Centralized Security & Permissions
- Fine-grained access control at the table, column,
and row levels.
- Integrated with IAM, Amazon Athena, Redshift
Spectrum, EMR, QuickSight, etc.
- Data Ingestion & Blueprinting
- Built-in blueprints to bring in data from:
- RDS, Aurora, DynamoDB, or
other databases.
- Log files and flat files in
S3.
- Automates ETL workflows via AWS Glue.
- Data Sharing
- Share governed datasets across AWS accounts and
organizations without copying data.
Centralized Policy EnforcementKey Features
- Policies defined in Lake Formation are automatically applied across integrated services (Athena, Redshift, EMR, QuickSight).
- Example: A financial analyst may see only Region = US data and only specific columns (e.g., revenue but not PII).
- Uses ACID transactions for data lake operations, enabling concurrent read/write.
- Lake Formation can automatically detect PII, formats, and apply tags.
Architecture View
- Data Sources
→ RDS, DynamoDB, S3, streaming (Kinesis/Kafka).
- Ingestion via Lake Formation Blueprints / Glue → Data is cleaned, transformed.
- Storage in Amazon S3 → Raw, curated, transformed zones.
- Metadata in Glue Data Catalog → Unified schema registry.
- Security & Governance in Lake Formation → Central policy control.
- Consumption via Analytics Services → Athena, Redshift Spectrum, EMR, SageMaker, QuickSight.
When to Use Lake
Formation
- Build a governed data lake from multiple data
sources.
- Enforce centralized, fine-grained access control
across analytics tools.
- Share data across accounts without duplication.
- Simplify ingestion & cataloging with automation.
No comments:
Post a Comment