Think - with -Tech: Amazon QuickSight

Sunday, September 7, 2025

Amazon QuickSight | Deep Dive.

Amazon QuickSight - Deep Dive.

Scope:

The concept,
How Amazon QuickSight works,
Architecture,
Features,
Pricing,
Integrations,
Security,
Best practices.

1. The Concept: QuickSight

Amazon QuickSight is AWS’s serverless, cloud-native Business Intelligence (BI) service, that allows twtech to:

Connect to multiple data sources (AWS + external).
Create interactive dashboards, reports, and visualizations.
Use ML-powered insights like anomaly detection and forecasting.
Scale to thousands of users without needing traditional BI infrastructure.

NB:

It’s designed as a SaaS BI solution—elastic, pay-per-use, and embedded-friendly.

2. Architecture Overview

At a high level, QuickSight sits on top of twtech data sources and provides a visualization/insight layer.

Core Components:

Data Sources – S3, RDS, Redshift, Athena, Aurora, DynamoDB, Salesforce, Snowflake, on-premises via JDBC, etc.
Data Preparation – SPICE (Super-fast, Parallel, In-memory Calculation Engine) or Direct Query.
Datasets – Modeled data inside QuickSight (joins, filters, calculated fields).
Analyses – Where you build charts, visuals, and dashboards.
Dashboards – Read-only, published version of an analysis, sharable across orgs.
Stories – Narratives of dashboards that update with time (data storytelling).

3. SPICE vs Direct Query

QuickSight offers two modes for querying:

SPICE

In-memory engine optimized for speed.
Best for repeated queries & scaling users.
Supports scheduled refresh.
Each user gets 10 GB free SPICE storage, expandable.

Direct Query

Queries run directly against the source.
Always up-to-date but may be slower.
Good for real-time dashboards.

4. Key Features

Core BI Features

Rich visualization types (bar, line, heatmap, geospatial maps, Sankey, KPIs, pivot tables).
Parameters and controls for interactivity.
Row-level security (RLS) for granular access.
Dashboard embedding (into apps, portals, SaaS).
Multi-tenant capabilities for SaaS providers.

ML-Powered Features

Anomaly detection: automatic pattern recognition.
Forecasting: time-series predictions.
Auto-narratives: machine-generated text insights.
Natural Language Queries: “Q” lets business users type plain-English questions (“What was revenue last quarter in APAC?”).

🔒 Security

AWS IAM integration.
Row-level security with tags/filters.
Encryption (in transit & at rest).
VPC connectivity for private data.

5. Pricing

QuickSight is pay-per-session for readers:

Authors: $24/month (Enterprise edition).
Readers: $0.30 per session (max $5 per month per reader).
SPICE capacity: 10 GB free/user, then $0.25 per GB/month.

NB:

This pricing model makes it attractive for orgs with many occasional BI users.

6. Integrations

AWS Native: Redshift, Athena, S3, RDS, DynamoDB, Aurora.
3rd-party: Salesforce, Snowflake, MySQL, Postgres, Teradata, Presto.
Embedded Analytics: via APIs + SDKs (JS).
Federated Identity: SAML, Active Directory, Okta.

7. Enterprise Use Cases

Executive dashboards – KPIs, revenue, performance.
Operational analytics – near real-time metrics from IoT, apps, logs.
Embedded analytics – SaaS vendors embedding dashboards into their platforms.
Self-service BI – business users using Q for NL queries.
Compliance & Security analytics – analyzing logs (CloudTrail, GuardDuty, VPC Flow Logs).

8. Best Practices

Use SPICE for high concurrency + fast dashboards.
Enable Row-Level Security (RLS) for multi-tenant SaaS environments.
Optimize data modeling in source systems before importing to QuickSight.
Use parameters and controls to create interactive dashboards.
Monitor usage with CloudWatch metrics.
Automate deployments with APIs & CLI.

9. QuickSight vs Other BI Tools

Power BI / Tableau → stronger in desktop authoring, but heavier infra and licensing costs.
Looker → stronger in modeling (LookML), but more complex setup.
QuickSight → excels in scalability, AWS-native integration, pay-per-use model, ML features.

Final thoughts:

Amazon QuickSight is best suited for organizations already leveraging AWS, SaaS providers embedding analytics, and enterprises that want elastic BI without managing servers.
It might be less attractive if twtech wants offline desktop modeling or highly advanced visualization customization compared to Tableau/Power BI.

Insights:

A practical, step-by-step architecture twtech follows to put Amazon QuickSight on top of S3 + Athena + SPICE with row-level security (RLS).
It includes the data flow, concrete config snippets (Athena DDL, IAM policy, example RLS CSV).
It is the best practice and testing checklist for twtech to implement end-to-end.

Quick high-level diagram

S3 (raw → curated) → Glue ETL / Lambda / Kinesis Firehose → Glue Data Catalog (tables) → Athena (queries / views) → QuickSight datasource → Dataset → (Import to) SPICE → Dashboards

(console / embedded)
Users (QuickSight accounts or embedded app users) — RLS applied at dataset → filtered visuals

Step-by-step: Prerequisite

AWS account with QuickSight Enterprise (for advanced features like VPC connection & SSO).
S3 bucket(s) for raw + curated data.
Athena enabled and a WorkGroup with result location (S3).
Glue Data Catalog (Glue crawler or managed tables).
KMS CMK if twtech needs customer-managed encryption.
QuickSight admin user (to create data source, datasets, RLS rules).

1) Ingest & store data in S3 (layout + format)

Goal: fast reads from Athena and efficient SPICE ingestion.

Recommended layout:

s3://twtech-org-data/raw/events/year=2025/month=09/day=07/part-000.parquet

s3://twtech-org-data/curated/events/year=2025/month=09/day=07/part-000.parquet

Best practices:

Store analytics data as columnar Parquet with Snappy compression.
Partition by date (year/month/day) and a tenant/region column if multi-tenant.
Keep schema stable; include immutable partitioning key columns.

2) Catalog the data (Glue / Athena)

Either run a Glue crawler or create an Athena external table that points to twtech curated S3 prefix.
Sample Athena DDL (Parquet, partitioned):

CREATE EXTERNAL TABLE IF NOT EXISTS curated.events (

event_id string,

user_id string,

region string,

event_time timestamp,

value double

)

PARTITIONED BY (year int, month int, day int)

STORED AS PARQUET

LOCATION 's3://twtech-org-data/curated/events/';

After adding partitions:

MSCK REPAIR TABLE curated.events;

-- or ALTER TABLE curated.events ADD PARTITION (...) LOCATION '...'

Optimize:

Partition pruning: query with year/month/day filters.
Consider Athena materialized views for expensive aggregations.
Use Athena workgroup with enforced encryption and query limits.

3) ETL / Transformation (Glue / Lambda / Spark)

Use Glue ETL (PySpark) to convert raw CSV/JSON → Parquet and write to /curated/... with partitions.
Validate schema and set up job bookmarks to process only new data.
Keep a table of permissions (for dynamic RLS) in twtech curated area if it plans to use a permissions table.

4) IAM + KMS: give QuickSight secure access

QuickSight needs permission to read the data catalog, Athena, and S3 (and KMS if used).
Create an IAM role or policy for QuickSight.

# Sample minimal policy (replace ARNs/placeholders):

{

"Version":"2012-10-17",

"Statement":[

{

"Effect":"Allow",

"Action":[

"s3:GetObject",

"s3:ListBucket",

"s3:GetBucketLocation"

"Resource":[

"arn:aws:s3:::twtech-org-data",

"arn:aws:s3:::twtech-org-data/*"

]

{

"Effect":"Allow",

"Action":[

"athena:StartQueryExecution",

"athena:GetQueryExecution",

"athena:GetQueryResults"

"Resource":"*"

{

"Effect":"Allow",

"Action":[

"glue:GetTable",

"glue:GetDatabase",

"glue:GetPartitions"

"Resource":"*"

{

"Effect":"Allow",

"Action":["kms:Decrypt"],

"Resource":["arn:aws:kms:us-east-2:accountID:key/twtech-kms-key-id"]

}

]

}

Notes:

Prefer scoped resources over "*" where practical.
If QuickSight will access data in a VPC, configure a QuickSight VPC connection and security groups.

5) Create QuickSight Data Source → Dataset → SPICE

In QuickSight Console: create a new Data source → choose Athena → connect (select workgroup).
Create Dataset: choose table or use Custom SQL to pre-aggregate/filter.
For performance, choose Import to SPICE (SPICE -> in-memory). Set refresh schedule (hourly/daily as needed).

# Sample custom SQL twtech might use for the dataset:

SELECT

date_format(event_time, '%Y-%m-%d') as day,

region,

user_id,

SUM(value) as total_value

FROM curated.events

WHERE year = 2025

GROUP BY date_format(event_time, '%Y-%m-%d'), region, user_id;

SPICE tips:

Keep SPICE datasets trimmed to only needed columns/rows.
Schedule SPICE refresh to balance freshness and cost.

6) Row-Level Security (RLS) — two approaches

Option A — QuickSight dataset RLS (recommended for many setups)

Create a CSV mapping that maps QuickSight principal (user email or group) → allowed value(s) (e.g., region, tenant_id).
Upload that CSV in the dataset settings under Row-level security (Apply RLS rules).
QuickSight will enforce filtering for each user or group.

# Sample CSV (two columns: principal, allowed_value):

twtech671@gmail.com,US

patemf2021@yahoo.com,SA

sales-group,US

NB:

principal can be the QuickSight user’s email or a QuickSight group name (how twtech tenant is represented in QuickSight).
Rules can allow multiple values by having multiple rows per principal.

When to use:

simple mapping files, small number of principals, or static permissions.

Option B — Dynamic RLS via join in source (for scale / tenant tables)

Maintain a permissions table in Athena/Glue: e.g., permissions(twtech_user_email, tenant_id) or permissions(twtech_group_name, us-east-2).
In twtech dataset SQL (Athena view) join the main data with the permissions table to filter based on the principal.
When embedding, pass the QuickSight username (or map SAML attribute to QuickSight user identity) so the correct principal is known.

Sample idea (pseudo):

SELECT d.*

FROM curated.events d

JOIN curated.permissions p

ON d.tenant_id = p.tenant_id

WHERE p.user_email = '<twtech_user_email>';

NB:

QuickSight cannot substitute a runtime SQL variable into Athena from the UI directly.
This approach typically works when the dataset contains the permissions table and QuickSight applies a filter using the identity set at session time (embedding + session user),
or when the dataset is pre-filtered per tenant during SPICE ingestion (e.g., create per-tenant SPICE datasets).

When to use:

Many users/tenants, dynamic permissions, or when twtech must version control permissions in the data layer.

7) Build analyses & dashboard (authoring best practices)

Use parameters & controls for date and region selectors.
Use calculated fields sparingly (compute heavy logic upstream when possible).
Add filters and default values to reduce SPICE load.
Publish dashboard and test with multiple QuickSight users.

8) Embedding dashboards (optional)

For embedding into twtech-app (SaaS multi-tenant), use QuickSight embedding APIs to generate an embed URL or use the registered user embedding model.
When embedding and enforcing RLS, ensure the QuickSight identity twtech use for the embedded session is tied to the tenant/user so RLS applies correctly.

9) Security, networking & governance

Encrypt S3 (SSE-S3 or SSE-KMS); enable KMS key policies that allow QuickSight to decrypt.
Athena query results: set WorkGroup to write results to a secure S3 location and enable encryption.
Use VPC endpoints for S3 and VPC connection for QuickSight if you need private access.
Audit usage with CloudTrail and CloudWatch (QuickSight usage logs + Athena query logs).

10) Cost & performance tips

Use SPICE for high concurrency and fast dashboards; use Direct Query for very fresh/rare reads.
Compress & use Parquet to reduce Athena costs.
Partition aggressively and ensure queries include partition predicates.
Monitor SPICE storage and remove stale datasets.
Use Athena WorkGroup limits and query cost alerts.

Sample checklist to implement

Create S3 buckets and decide raw vs curated structure + partitioning scheme.
Build Glue job to produce Parquet and update partitions.

Create Glue crawler or Athena DDL and verify partitions.
Create IAM role/policy and KMS key for QuickSight access.

Configure QuickSight data source to Athena, build dataset and import to SPICE.
Decide RLS method: upload CSV mapping or create permissions table & incorporate into dataset.
Publish dashboards, test as multiple users (validate RLS).

Set SPICE refresh schedule and enable monitoring/alerts.

QuickSight troubleshooting / testing tips

Test Athena queries directly first (partition pruning, cost).
In QuickSight, use a test user with the same email/group to validate RLS.
If SPICE import fails, check IAM permissions and KMS decrypt rights.
If dashboards are slow, check SPICE vs Direct Query, and examine Athena queries for full table scans.

Think - with -Tech

Sunday, September 7, 2025

Amazon QuickSight | Deep Dive.

No comments:

Post a Comment

Databases Explained & Use Cases with (Flash Card) | Overview.

Blog Archive