AWS Lake Formation Centralized Permissions - Deep Dive.
Scope:
- The Coneept: AWS
Lake Formation Centralized Permissions ,
- Key Concepts in Centralized Permissions,
- How Permissions are Enforced,
- Centralized vs. Legacy IAM/S3 Permissions,
- Architecture View,
- Best Practices for Centralized Permissions.
The Concept of AWS Lake Formation Centralized Permissions
- Lake Formation provides a centralized governance model for managing fine-grained access control to data stored in Amazon S3 and cataloged in the AWS Glue Data Catalog.
- Instead of managing IAM policies and S3 bucket policies directly, twtech defines table-, column-, and row-level permissions in Lake Formation and apply them consistently across analytics services like:
- Athena,
- Redshift
- Spectrum,
- EMR,
- Glue ETL.
Key Concepts in
Centralized Permissions
1.
Data Catalog as the Policy Store
- Lake Formation permissions are stored in the Glue
Data Catalog.
- The catalog acts as a single source of truth for schema + access policies.
- Services (Athena, Redshift Spectrum, EMR) consult Lake Formation at query time.
2.
Principals
- IAM users, IAM roles, or SAML identities can be granted access.
- Principals are tied to Lake Formation–defined policies, not just IAM policies.
3.
Resource Types
- Databases
- Tables
- Table columns (column-level access)
- Data locations (S3 buckets or prefixes)
4.
Permission Types
- Metadata permissions
– Describe what schema objects a user can see/query.
- Examples: DESCRIBE, CREATE_TABLE, ALTER, DROP, Super.
- Data permissions
- Data permissions – Govern actual data access in S3.
- Examples: SELECT, INSERT, DELETE.
- Row-level and column-level filters
- Row-level and column-level filters (fine-grained governance):
- Column filters
→ Limit access to specific columns.
- Row filters → Restrict access based on SQL predicates.
5.
Cross-Account Sharing
- Lake Formation supports resource sharing across AWS
accounts via AWS RAM.
- Central team (data producers) controls schema & access, consumer accounts use data securely without duplicating policies.
How Permissions are Enforced
- User/Role requests query → e.g., via Athena, Redshift Spectrum, EMR, Glue job.
- Service calls Lake Formation to check authorization.
- Lake Formation evaluates:
- Catalog metadata permissions
- S3 location permissions
- Row/column filters
- If allowed → service retrieves only permitted data.
If denied → access blocked.
Centralized vs. Legacy IAM/S3
Permissions
- Legacy model: twtech has to manage S3 bucket policies + IAM role policies + Glue
permissions independently (messy & hard to audit).
- Lake Formation centralized model:
- Single permission plane for data + metadata.
- Automatic enforcement across services.
- Easier auditing via AWS CloudTrail + AWS
Lake Formation audit logs.
Architecture View
Here’s how a centralized permissions
fit in:
Producers (Data Lake Admins)
⬇️ Define schema + grant Lake Formation permissions
⬇️ Lake Formation (Central Governance Layer)
- Stores metadata in Glue Data Catalog
- Stores policies (table, column, row-level)
- Integrates with IAM
Consumers (Athena, Redshift
Spectrum, EMR, Glue)
⬇️ Query data → Permission checks in Lake Formation → Filtered access to S3
Best Practices for Centralized
Permissions
- Least privilege:
Start with SELECT only, expand as needed.
- Use data filters: Apply row/column filters instead of duplicating tables.
- Tag-based access control (LF-TBAC):
- Assign LF-tags (business domain, sensitivity
level) to resources.
- Grant access based on tags, not individual tables.
- Cross-account sharing
- Cross-account sharing: Keep data producers & consumers separated for governance.
- Audit: Enable CloudTrail logs for Lake Formation and monitor access.
- Migrate from IAM-based to Lake Foundation (LF) -based permissions progressively using Hybrid mode.
No comments:
Post a Comment