Sunday, September 7, 2025

Amazon QuickSight | Deep Dive.

Amazon QuickSight - Deep Dive.

Scope:

  •      The concept,
  •      How Amazon QuickSight works,
  •      Architecture,
  •      Features,
  •      Pricing,
  •      Integrations,
  •      Security,
  •      Best practices.

1. The Concept: QuickSight

    • Amazon QuickSight is AWS’s serverless, cloud-native Business Intelligence (BI) service, that allows twtech to:
    • Connect to multiple data sources (AWS + external).
    • Create interactive dashboards, reports, and visualizations.
    • Use ML-powered insights like anomaly detection and forecasting.
    • Scale to thousands of users without needing traditional BI infrastructure.

NB:

    • It’s designed as a SaaS BI solution—elastic, pay-per-use, and embedded-friendly.

2. Architecture Overview

At a high level, QuickSight sits on top of twtech data sources and provides a visualization/insight layer.

 Core Components:

    • Data Sources – S3, RDS, Redshift, Athena, Aurora, DynamoDB, Salesforce, Snowflake, on-premises via JDBC, etc.
    • Data Preparation – SPICE (Super-fast, Parallel, In-memory Calculation Engine) or Direct Query.
    • Datasets – Modeled data inside QuickSight (joins, filters, calculated fields).
    • Analyses – Where you build charts, visuals, and dashboards.
    • Dashboards – Read-only, published version of an analysis, sharable across orgs.
    • Stories – Narratives of dashboards that update with time (data storytelling).

3. SPICE vs Direct Query

QuickSight offers two modes for querying:

  • SPICE
    • In-memory engine optimized for speed.
    • Best for repeated queries & scaling users.
    • Supports scheduled refresh.
    • Each user gets 10 GB free SPICE storage, expandable.
  • Direct Query
    • Queries run directly against the source.
    • Always up-to-date but may be slower.
    • Good for real-time dashboards.

4. Key Features

 Core BI Features

    • Rich visualization types (bar, line, heatmap, geospatial maps, Sankey, KPIs, pivot tables).
    • Parameters and controls for interactivity.
    • Row-level security (RLS) for granular access.
    • Dashboard embedding (into apps, portals, SaaS).
    • Multi-tenant capabilities for SaaS providers.

 ML-Powered Features

    • Anomaly detection: automatic pattern recognition.
    • Forecasting: time-series predictions.
    • Auto-narratives: machine-generated text insights.
    • Natural Language Queries: “Q” lets business users type plain-English questions (“What was revenue last quarter in APAC?”).

🔒 Security

    • AWS IAM integration.
    • Row-level security with tags/filters.
    • Encryption (in transit & at rest).
    • VPC connectivity for private data.

5. Pricing

QuickSight is pay-per-session for readers:

    • Authors: $24/month (Enterprise edition).
    • Readers: $0.30 per session (max $5 per month per reader).
    • SPICE capacity: 10 GB free/user, then $0.25 per GB/month.

NB:

  • This pricing model makes it attractive for orgs with many occasional BI users.

6. Integrations

    • AWS Native: Redshift, Athena, S3, RDS, DynamoDB, Aurora.
    • 3rd-party: Salesforce, Snowflake, MySQL, Postgres, Teradata, Presto.
    • Embedded Analytics: via APIs + SDKs (JS).
    • Federated Identity: SAML, Active Directory, Okta.

7. Enterprise Use Cases

    • Executive dashboards – KPIs, revenue, performance.
    • Operational analytics – near real-time metrics from IoT, apps, logs.
    • Embedded analytics – SaaS vendors embedding dashboards into their platforms.
    • Self-service BI – business users using Q for NL queries.
    • Compliance & Security analytics – analyzing logs (CloudTrail, GuardDuty, VPC Flow Logs).

8. Best Practices

    • Use SPICE for high concurrency + fast dashboards.
    • Enable Row-Level Security (RLS) for multi-tenant SaaS environments.
    • Optimize data modeling in source systems before importing to QuickSight.
    • Use parameters and controls to create interactive dashboards.
    • Monitor usage with CloudWatch metrics.
    • Automate deployments with APIs & CLI.

9. QuickSight vs Other BI Tools

    • Power BI / Tableau stronger in desktop authoring, but heavier infra and licensing costs.
    • Looker stronger in modeling (LookML), but more complex setup.
    • QuickSight excels in scalability, AWS-native integration, pay-per-use model, ML features.

Final thoughts:

    •        Amazon QuickSight is best suited for organizations already leveraging AWS, SaaS providers embedding analytics, and enterprises that want elastic BI without managing servers.
    •        It might be less attractive if twtech wants offline desktop modeling or highly advanced visualization customization compared to Tableau/Power BI.

Insights:

    • A practical, step-by-step architecture twtech follows to put Amazon QuickSight on top of S3 + Athena + SPICE with row-level security (RLS).
    • It includes the data flow, concrete config snippets (Athena DDL, IAM policy, example RLS CSV).
    • It is the best practice and testing checklist for twtech to implement end-to-end.

Quick high-level diagram

S3 (raw curated)  Glue ETL / Lambda / Kinesis Firehose  Glue Data Catalog (tables)  Athena (queries / views)  QuickSight datasource Dataset (Import to) SPICE Dashboards 

(console / embedded)
Users (QuickSight accounts or embedded app users) — RLS applied at dataset filtered visuals

Step-by-step: Prerequisite

    • AWS account with QuickSight Enterprise (for advanced features like VPC connection & SSO).
    • S3 bucket(s) for raw + curated data.
    • Athena enabled and a WorkGroup with result location (S3).
    • Glue Data Catalog (Glue crawler or managed tables).
    • KMS CMK if twtech needs customer-managed encryption.
    • QuickSight admin user (to create data source, datasets, RLS rules).

1) Ingest & store data in S3 (layout + format)

  • Goal: fast reads from Athena and efficient SPICE ingestion.

Recommended layout:

s3://twtech-org-data/raw/events/year=2025/month=09/day=07/part-000.parquet

s3://twtech-org-data/curated/events/year=2025/month=09/day=07/part-000.parquet

Best practices:

    • Store analytics data as columnar Parquet with Snappy compression.
    • Partition by date (year/month/day) and a tenant/region column if multi-tenant.
    • Keep schema stable; include immutable partitioning key columns.

2) Catalog the data (Glue / Athena)

    • Either run a Glue crawler or create an Athena external table that points to twtech curated S3 prefix.
    • Sample Athena DDL (Parquet, partitioned):

CREATE EXTERNAL TABLE IF NOT EXISTS curated.events (

  event_id string,

  user_id string,

  region string,

  event_time timestamp,

  value double

)

PARTITIONED BY (year int, month int, day int)

STORED AS PARQUET

LOCATION 's3://twtech-org-data/curated/events/';

After adding partitions:

MSCK REPAIR TABLE curated.events;

-- or ALTER TABLE curated.events ADD PARTITION (...) LOCATION '...'

Optimize:

    • Partition pruning: query with year/month/day filters.
    • Consider Athena materialized views for expensive aggregations.
    • Use Athena workgroup with enforced encryption and query limits.

3) ETL / Transformation (Glue / Lambda / Spark)

    • Use Glue ETL (PySpark) to convert raw CSV/JSON Parquet and write to /curated/... with partitions.
    • Validate schema and set up job bookmarks to process only new data.
    • Keep a table of permissions (for dynamic RLS) in twtech curated area if it plans to use a permissions table.

4) IAM + KMS: give QuickSight secure access

    • QuickSight needs permission to read the data catalog, Athena, and S3 (and KMS if used). 
    • Create an IAM role or policy for QuickSight.

# Sample minimal policy (replace ARNs/placeholders):

{

  "Version":"2012-10-17",

  "Statement":[

    {

      "Effect":"Allow",

      "Action":[

        "s3:GetObject",

        "s3:ListBucket",

        "s3:GetBucketLocation"

      ],

      "Resource":[

        "arn:aws:s3:::twtech-org-data",

        "arn:aws:s3:::twtech-org-data/*"

      ]

    },

    {

      "Effect":"Allow",

      "Action":[

        "athena:StartQueryExecution",

        "athena:GetQueryExecution",

        "athena:GetQueryResults"

      ],

      "Resource":"*"

    },

    {

      "Effect":"Allow",

      "Action":[

        "glue:GetTable",

        "glue:GetDatabase",

        "glue:GetPartitions"

      ],

      "Resource":"*"

    },

    {

      "Effect":"Allow",

      "Action":["kms:Decrypt"],

      "Resource":["arn:aws:kms:us-east-2:accountID:key/twtech-kms-key-id"]

    }

  ]

}

Notes:

    • Prefer scoped resources over "*" where practical.
    • If QuickSight will access data in a VPC, configure a QuickSight VPC connection and security groups.

5) Create QuickSight Data Source → Dataset → SPICE

    1. In QuickSight Console: create a new Data source → choose Athena → connect (select workgroup).
    2. Create Dataset: choose table or use Custom SQL to pre-aggregate/filter.
    3. For performance, choose Import to SPICE (SPICE -> in-memory). Set refresh schedule (hourly/daily as needed).

# Sample custom SQL twtech might use for the dataset:

SELECT

  date_format(event_time, '%Y-%m-%d') as day,

  region,

  user_id,

  SUM(value) as total_value

FROM curated.events

WHERE year = 2025

GROUP BY date_format(event_time, '%Y-%m-%d'), region, user_id;

SPICE tips:

    • Keep SPICE datasets trimmed to only needed columns/rows.
    • Schedule SPICE refresh to balance freshness and cost.

6) Row-Level Security (RLS) — two approaches

Option A — QuickSight dataset RLS (recommended for many setups)

    • Create a CSV mapping that maps QuickSight principal (user email or group) allowed value(s) (e.g., region, tenant_id).
    • Upload that CSV in the dataset settings under Row-level security (Apply RLS rules).
    • QuickSight will enforce filtering for each user or group.

# Sample CSV (two columns: principal, allowed_value):

twtech671@gmail.com,US

patemf2021@yahoo.com,SA

sales-group,US

NB:

    • principal can be the QuickSight user’s email or a QuickSight group name (how twtech tenant is represented in QuickSight).
    • Rules can allow multiple values by having multiple rows per principal.
When to use:
  • simple mapping files, small number of principals, or static permissions.

Option B — Dynamic RLS via join in source (for scale / tenant tables)

    • Maintain a permissions table in Athena/Glue: e.g., permissions(twtech_user_email, tenant_id) or permissions(twtech_group_name, us-east-2).
    • In twtech dataset SQL (Athena view) join the main data with the permissions table to filter based on the principal.
    • When embedding, pass the QuickSight username (or map SAML attribute to QuickSight user identity) so the correct principal is known.

Sample idea (pseudo):

SELECT d.*

FROM curated.events d

JOIN curated.permissions p

  ON d.tenant_id = p.tenant_id

WHERE p.user_email = '<twtech_user_email>';

NB:

    • QuickSight cannot substitute a runtime SQL variable into Athena from the UI directly.
    •  This approach typically works when the dataset contains the permissions table and QuickSight applies a filter using the identity set at session time (embedding + session user),
    •  or when the dataset is pre-filtered per tenant during SPICE ingestion (e.g., create per-tenant SPICE datasets).

When to use

  • Many users/tenants, dynamic permissions, or when twtech must version control permissions in the data layer.

7) Build analyses & dashboard (authoring best practices)

    • Use parameters & controls for date and region selectors.
    • Use calculated fields sparingly (compute heavy logic upstream when possible).
    • Add filters and default values to reduce SPICE load.
    • Publish dashboard and test with multiple QuickSight users.

8) Embedding dashboards (optional)

    • For embedding into twtech-app (SaaS multi-tenant), use QuickSight embedding APIs to generate an embed URL or use the registered user embedding model.
    • When embedding and enforcing RLS, ensure the QuickSight identity twtech use for the embedded session is tied to the tenant/user so RLS applies correctly.

9) Security, networking & governance

    • Encrypt S3 (SSE-S3 or SSE-KMS); enable KMS key policies that allow QuickSight to decrypt.
    • Athena query results: set WorkGroup to write results to a secure S3 location and enable encryption.
    • Use VPC endpoints for S3 and VPC connection for QuickSight if you need private access.
    • Audit usage with CloudTrail and CloudWatch (QuickSight usage logs + Athena query logs).

10) Cost & performance tips

    • Use SPICE for high concurrency and fast dashboards; use Direct Query for very fresh/rare reads.
    • Compress & use Parquet to reduce Athena costs.
    • Partition aggressively and ensure queries include partition predicates.
    • Monitor SPICE storage and remove stale datasets.
    • Use Athena WorkGroup limits and query cost alerts.

Sample checklist to implement

Create S3 buckets and decide raw vs curated structure + partitioning scheme.
Build Glue job to produce Parquet and update partitions.

Create Glue crawler or Athena DDL and verify partitions.
Create IAM role/policy and KMS key for QuickSight access.

Configure QuickSight data source to Athena, build dataset and import to SPICE.
Decide RLS method: upload CSV mapping or create permissions table & incorporate into dataset.
Publish dashboards, test as multiple users (validate RLS).

Set SPICE refresh schedule and enable monitoring/alerts.

QuickSight troubleshooting / testing tips

    • Test Athena queries directly first (partition pruning, cost).
    • In QuickSight, use a test user with the same email/group to validate RLS.
    • If SPICE import fails, check IAM permissions and KMS decrypt rights.
    • If dashboards are slow, check SPICE vs Direct Query, and examine Athena queries for full table scans.


No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...