Think - with -Tech: Amazon Redshift

Tuesday, September 2, 2025

Amazon Redshift - Deep Dive.

Scope:

Intro:

Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse service from Amazon Web Services (AWS).
Amazon Redshift is designed for analyzing large datasets using standard SQL and existing business intelligence (BI) tools.
Amazon Redshift is Optimized for online analytical processing (OLAP).
Amazon Redshift handles complex queries against large amounts of data efficiently.
https://aws.amazon.com/redshift/

1. The Concept: Redshift

Amazon Redshift is AWS’s fully managed, petabyte-scale cloud data warehouse designed for OLAP (Online Analytical Processing) workloads.
Amazon Redshift allows twtech to run complex SQL queries across structured/semi-structured data with very high performance and integrates tightly with the AWS ecosystem.

2. Core Architecture

Redshift is built on MPP (Massively Parallel Processing) principles.

3. Data Storage & Distribution

Redshift uses columnar storage for efficiency.

4. Query Processing

5. Performance Features

6. Security

7. Modern Enhancements

RA3 Nodes (separate compute & storage, managed storage on S3).
AQUA (Advanced Query Accelerator) → Hardware-accelerated cache layer.
Data Sharing → Share live data across Redshift clusters without copies.
Semi-structured Data support with SUPER data type + PartiQL queries (like JSON).
ML Integration → Train/deploy ML models inside Redshift using SageMaker.
Federated Query → Query data in RDS, Aurora, or other sources via Redshift.

8. Integrations

9. Best Practices

Choose RA3 nodes for separation of compute & storage.
Use DISTKEY + SORTKEY wisely for joins and filters.
Leverage Spectrum for infrequent/large historical data.
Keep tables ANALYZED & VACUUMED for query optimizer efficiency.
Partition large datasets by time (with DATE columns).
Monitor performance with Redshift Console + CloudWatch + system tables (STL, SVL, STV).

10. When to Use & when to avoid Redshit

✅ when to use Redshit:

❌ whn to Avoid Redshift:

twech needs high-speed OLTP (Online Analytics Processing), instead use RDS/Aurora instead.
twtech workloads are small-scale , instead use Athena or Aurora that are more cost-effective.