Amazon QuickSight - Deep Dive.
Scope:
- The concept,
- How Amazon QuickSight works,
- Architecture,
- Features,
- Pricing,
- Integrations,
- Security,
- Best practices.
1. The Concept: QuickSight
- Amazon QuickSight is AWS’s serverless, cloud-native Business Intelligence (BI) service, that allows twtech to:
- Connect to multiple data sources (AWS + external).
- Create interactive dashboards, reports, and visualizations.
- Use ML-powered insights like anomaly detection and forecasting.
- Scale to thousands of users without needing traditional BI infrastructure.
NB:
- It’s designed as a SaaS BI solution—elastic, pay-per-use, and embedded-friendly.
2. Architecture Overview
At a high level, QuickSight sits on
top of twtech data sources and provides a visualization/insight
layer.
Core Components:
- Data Sources – S3, RDS, Redshift, Athena, Aurora, DynamoDB, Salesforce, Snowflake, on-premises via JDBC, etc.
- Data Preparation – SPICE (Super-fast, Parallel, In-memory Calculation Engine) or Direct Query.
- Datasets – Modeled data inside QuickSight (joins, filters, calculated fields).
- Analyses – Where you build charts, visuals, and dashboards.
- Dashboards – Read-only, published version of an analysis, sharable across orgs.
- Stories – Narratives of dashboards that update with time (data storytelling).
3. SPICE vs Direct Query
QuickSight offers two
modes for querying:
- SPICE
- In-memory engine optimized for speed.
- Best for repeated queries & scaling users.
- Supports scheduled refresh.
- Each user gets 10 GB free SPICE storage,
expandable.
- Direct Query
- Queries run directly against the source.
- Always up-to-date but may be slower.
- Good for real-time dashboards.
4. Key Features
Core BI Features
- Rich visualization types (bar, line, heatmap, geospatial maps, Sankey, KPIs, pivot tables).
- Parameters and controls for interactivity.
- Row-level security (RLS) for granular access.
- Dashboard embedding (into apps, portals, SaaS).
- Multi-tenant capabilities for SaaS providers.
ML-Powered
Features
- Anomaly detection:
automatic pattern recognition.
- Forecasting: time-series predictions.
- Auto-narratives: machine-generated text insights.
- Natural Language Queries: “Q” lets business users type plain-English questions (“What was revenue last quarter in
APAC?”).
🔒 Security
- AWS IAM integration.
- Row-level security with tags/filters.
- Encryption (in transit & at rest).
- VPC connectivity for private data.
5. Pricing
QuickSight is pay-per-session
for readers:
- Authors:
$24/month (Enterprise edition).
- Readers: $0.30 per session (max $5 per month per reader).
- SPICE capacity: 10 GB free/user, then $0.25 per GB/month.
NB:
- This pricing model makes it attractive for orgs with many occasional BI users.
6. Integrations
- AWS Native: Redshift, Athena, S3, RDS, DynamoDB, Aurora.
- 3rd-party: Salesforce, Snowflake, MySQL, Postgres, Teradata, Presto.
- Embedded Analytics: via APIs + SDKs (JS).
- Federated Identity: SAML, Active Directory, Okta.
7. Enterprise Use Cases
- Executive dashboards –
KPIs, revenue, performance.
- Operational analytics – near real-time metrics from IoT, apps, logs.
- Embedded analytics – SaaS vendors embedding dashboards into their platforms.
- Self-service BI – business users using Q for NL queries.
- Compliance & Security analytics – analyzing logs (CloudTrail, GuardDuty, VPC Flow Logs).
8. Best Practices
- Use SPICE for
high concurrency + fast dashboards.
- Enable Row-Level Security (RLS) for multi-tenant SaaS environments.
- Optimize data modeling in source systems before importing to QuickSight.
- Use parameters and controls to create interactive dashboards.
- Monitor usage with CloudWatch metrics.
- Automate deployments with APIs & CLI.
9. QuickSight vs Other BI Tools
- Power BI / Tableau → stronger in desktop authoring, but heavier infra and licensing costs.
- Looker → stronger in modeling (LookML), but more complex setup.
- QuickSight → excels in scalability, AWS-native integration, pay-per-use model,
ML features.
Final thoughts:
- Amazon QuickSight is best suited for organizations already leveraging AWS, SaaS providers embedding analytics, and enterprises that want elastic BI without managing servers.
- It might be less attractive if twtech wants offline desktop modeling or highly advanced visualization customization compared to Tableau/Power BI.
Insights:
- A practical, step-by-step architecture twtech follows to put Amazon QuickSight on top of S3 + Athena + SPICE with row-level security (RLS).
- It includes the data flow, concrete config snippets (Athena DDL, IAM policy, example RLS CSV).
- It is the best practice and testing checklist for twtech to implement end-to-end.
Quick high-level diagram
S3 (raw → curated) → Glue ETL / Lambda / Kinesis Firehose → Glue Data Catalog (tables) → Athena (queries / views) → QuickSight datasource → Dataset → (Import to) SPICE → Dashboards
(console / embedded)
Users (QuickSight accounts or embedded app users) — RLS applied at
dataset → filtered visuals
Step-by-step: Prerequisite
- AWS account with QuickSight Enterprise (for advanced features like VPC connection & SSO).
- S3 bucket(s) for raw + curated data.
- Athena enabled and a WorkGroup with result location (S3).
- Glue Data Catalog (Glue crawler or managed tables).
- KMS CMK if twtech needs customer-managed encryption.
- QuickSight admin user (to create data source, datasets, RLS rules).
1) Ingest & store data in S3 (layout + format)
- Goal: fast reads from Athena and efficient SPICE ingestion.
Recommended layout:
s3://twtech-org-data/raw/events/year=2025/month=09/day=07/part-000.parquet
s3://twtech-org-data/curated/events/year=2025/month=09/day=07/part-000.parquet
Best practices:
- Store analytics data as columnar Parquet with Snappy compression.
- Partition by date (year/month/day) and a tenant/region column if multi-tenant.
- Keep schema stable; include immutable partitioning key columns.
2) Catalog the data (Glue
/ Athena)
- Either run a Glue crawler or create an Athena external table that points to twtech curated S3 prefix.
- Sample Athena DDL (Parquet, partitioned):
CREATE
EXTERNAL TABLE IF NOT EXISTS curated.events (
event_id string,
user_id string,
region string,
event_time timestamp,
value double
)
PARTITIONED
BY (year int, month int, day int)
STORED
AS PARQUET
LOCATION
's3://twtech-org-data/curated/events/';
After adding partitions:
MSCK
REPAIR TABLE curated.events;
-- or
ALTER TABLE curated.events ADD PARTITION (...) LOCATION '...'
Optimize:
- Partition pruning: query with year/month/day filters.
- Consider Athena materialized views for expensive aggregations.
- Use Athena workgroup with enforced encryption and query limits.
3) ETL / Transformation (Glue
/ Lambda / Spark)
- Use Glue ETL (PySpark) to convert raw CSV/JSON →
Parquet and write to /curated/... with partitions.
- Validate schema and set up job bookmarks to process only new data.
- Keep a table of permissions (for dynamic RLS) in twtech curated area if it plans to use a permissions table.
4) IAM + KMS: give
QuickSight secure access
- QuickSight needs permission to read the data catalog, Athena, and S3 (and KMS if used).
- Create an IAM role or policy for QuickSight.
# Sample minimal policy (replace
ARNs/placeholders):
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action":[
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource":[
"arn:aws:s3:::twtech-org-data",
"arn:aws:s3:::twtech-org-data/*"
]
},
{
"Effect":"Allow",
"Action":[
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults"
],
"Resource":"*"
},
{
"Effect":"Allow",
"Action":[
"glue:GetTable",
"glue:GetDatabase",
"glue:GetPartitions"
],
"Resource":"*"
},
{
"Effect":"Allow",
"Action":["kms:Decrypt"],
"Resource":["arn:aws:kms:us-east-2:accountID:key/twtech-kms-key-id"]
}
]
}
Notes:
- Prefer scoped resources over "*"
where practical.
- If QuickSight will access data in a VPC, configure a QuickSight VPC connection and security groups.
5) Create QuickSight Data Source → Dataset → SPICE
- In QuickSight Console: create a new Data source
→ choose Athena → connect (select
workgroup).
- Create Dataset: choose table or use Custom SQL to pre-aggregate/filter.
- For performance, choose Import to SPICE (SPICE -> in-memory). Set refresh schedule (hourly/daily as needed).
# Sample custom SQL twtech might use for the dataset:
SELECT
date_format(event_time, '%Y-%m-%d') as day,
region,
user_id,
SUM(value) as total_value
FROM
curated.events
WHERE
year = 2025
GROUP BY
date_format(event_time, '%Y-%m-%d'), region, user_id;
SPICE tips:
- Keep SPICE datasets trimmed to only needed
columns/rows.
- Schedule SPICE refresh to balance freshness and cost.
6) Row-Level Security (RLS)
— two approaches
Option
A — QuickSight dataset RLS (recommended for many setups)
- Create a CSV mapping that maps QuickSight principal (user
email or group) → allowed value(s) (e.g., region, tenant_id).
- Upload that CSV in the dataset settings under Row-level security (Apply RLS rules).
- QuickSight will enforce filtering for each user or group.
# Sample CSV (two columns: principal,
allowed_value):
twtech671@gmail.com,US
patemf2021@yahoo.com,SA
sales-group,US
NB:
- principal
can be the QuickSight user’s email or a QuickSight group name (how twtech tenant is represented in QuickSight).
- Rules can allow multiple values by having multiple rows per principal.
- simple mapping files, small number of principals, or static permissions.
Option B — Dynamic RLS via join in source (for scale / tenant tables)
- Maintain a permissions table in Athena/Glue:
e.g., permissions(twtech_user_email,
tenant_id) or permissions(twtech_group_name, us-east-2).
- In twtech dataset SQL (Athena view) join the main data with the permissions table to filter based on the principal.
- When embedding, pass the QuickSight username (or map SAML attribute to QuickSight user identity) so the correct principal is known.
Sample idea (pseudo):
SELECT
d.*
FROM
curated.events d
JOIN
curated.permissions p
ON d.tenant_id = p.tenant_id
WHERE p.user_email = '<twtech_user_email>';
NB:
- QuickSight cannot substitute a runtime SQL variable into Athena from the UI directly.
- This approach typically works when the dataset contains the permissions table and QuickSight applies a filter using the identity set at session time (embedding + session user),
- or when the dataset is pre-filtered per tenant during SPICE ingestion (e.g., create per-tenant SPICE datasets).
When to use:
- Many users/tenants, dynamic permissions, or when twtech must version control permissions in the data layer.
7) Build analyses & dashboard (authoring best practices)
- Use parameters & controls for date and region
selectors.
- Use calculated fields sparingly (compute heavy logic upstream when possible).
- Add filters and default values to reduce SPICE load.
- Publish dashboard and test with multiple QuickSight users.
8) Embedding dashboards (optional)
- For embedding into twtech-app (SaaS multi-tenant), use
QuickSight embedding APIs to generate an embed URL or use the registered
user embedding model.
- When embedding and enforcing RLS, ensure the QuickSight identity twtech use for the embedded session is tied to the tenant/user so RLS applies correctly.
9) Security, networking & governance
- Encrypt S3 (SSE-S3 or SSE-KMS); enable KMS key policies
that allow QuickSight to decrypt.
- Athena query results: set WorkGroup to write results to a secure S3 location and enable encryption.
- Use VPC endpoints for S3 and VPC connection for QuickSight if you need private access.
- Audit usage with CloudTrail and CloudWatch (QuickSight usage logs + Athena query logs).
10) Cost & performance tips
- Use SPICE for high concurrency and fast dashboards; use
Direct Query for very fresh/rare reads.
- Compress & use Parquet to reduce Athena costs.
- Partition aggressively and ensure queries include partition predicates.
- Monitor SPICE storage and remove stale datasets.
- Use Athena WorkGroup limits and query cost alerts.
Sample checklist to implement
Create S3 buckets and decide raw vs curated structure + partitioning scheme.
Build Glue job to produce Parquet and update partitions.Create Glue crawler or Athena DDL and verify partitions.
Create IAM role/policy and KMS key for QuickSight access.Configure QuickSight data source to Athena, build dataset and import to SPICE.
Decide RLS method: upload CSV mapping or create permissions table & incorporate into dataset.
Publish dashboards, test as multiple users (validate RLS).Set SPICE refresh schedule and enable monitoring/alerts.
QuickSight troubleshooting / testing tips
- Test Athena queries directly first (partition pruning, cost).
- In QuickSight, use a test user with the same email/group to validate RLS.
- If SPICE import fails, check IAM permissions and KMS decrypt rights.
- If dashboards are slow, check SPICE vs Direct Query, and examine Athena queries for full table scans.
No comments:
Post a Comment