Amazon Keyspaces (for Apache Cassandra) - Deep Dive.
Scope:
- Intro,
- The Concept of Amazon Keyspaces,
- Key Features,
- Architecture,
- Sample table definition,
- Write and Read Operations,
- Security & IAM,
- Monitoring & Observability,
- Common Use Cases,
- Keyspaces vs. Self-managed Cassandra vs. DynamoDB.
Intro:
- Amazon Keyspaces is a serverless, fully managed Cassandra-compatible database service on AWS.
- Apache Cassandra is an Open-Source NoSQL distributed database.
- Amazon Keyspaces lets twtech to run Cassandra
workloads without managing clusters, scaling, or
infrastructure.
1.
The Concept of Amazon Keyspaces
- Managed Cassandra:
Offers the same CQL (Cassandra Query Language) APIs as open-source
Cassandra.
- Serverless & Scalable: Automatically scales tables based on request volume.… with millisecond latency that can handle 1000s of requests per second.
- High Availability: Replicated across multiple AZs within a region.
- Billing Model: Pay only for what twtech uses (reads, writes, storage).
2.
Key Features
a)
Cassandra Compatibility
- Supports CQL v3.11 API.
- Use existing Cassandra drivers (Java, Python,
Go, Node.js, etc.).
- Works with Apache Cassandra tooling (cqlsh, drivers, ORMs).
b)
Serverless Scaling
- Scales up or down automatically based on
throughput demand.
- No cluster provisioning or manual scaling required.
c)
Two Capacity Modes
- On-demand Mode
- Auto-scales based on traffic.
- Pay-per-request billing.
- Ideal for spiky workloads.
- Provisioned Mode
- Manually define read/write capacity units (RCUs,
WCUs).
- Can enable auto-scaling.
- Lower cost for predictable workloads.
d)
Replication & Durability
- Data replicated 3x across multiple AZs.
- Built-in backup and PITR (Point-In-Time Recovery).
e)
Security
- Encryption at rest
with AWS KMS.
- TLS in transit.
- IAM-based authentication instead of Cassandra’s username/password.
- VPC endpoints for private access.
3.
Architecture
- No cluster nodes
exposed to customers (vs. self-managed Cassandra).
- AWS manages:
- Cluster provisioning
- Node replacement
- Repair/compaction
- Scaling
- Applications connect directly to Keyspaces
- Applications connect directly to Keyspaces via CQL-compatible endpoint.
4.
Data Model
Keyspaces follows Cassandra’s partitioned
row store model:
- Keyspace
→ top-level namespace (like a database).
- Table → collection of rows.
- Row → identified by a primary key (partition key + clustering columns).
- Columns → schema-defined, flexible structure.
# Sample table definition:
CREATE TABLE
twtech-users
(
user_id UUID,
first_name text,
last_name text,
email text,
PRIMARY KEY (twtech-user_id)
);
5.
Write and Read Operations
Writes
- Append-only (like Cassandra).
- Immutable SSTables with memtables + background compaction.
- Highly durable due to multi-AZ replication.
Reads
- Partition-aware (must supply partition key).
- Efficient range queries using clustering keys.
- Supports secondary indexes and materialized views (limited vs open-source Cassandra).
6. Accessing Amazon Keyspaces (Using CQLSH /Java)
# Using CQLSH
cqlsh
<keyspaces-endpoint> 9142 \
--ssl \
--username "twtech-iam-username" \
--password "twtech-iam-token"
# Using Java
CqlSession
session = CqlSession.builder()
.addContactPoint(new InetSocketAddress("cassandra.us-east-2.amazonaws.com",
9142))
.withLocalDatacenter("us-east-2")
.withAuthProvider(new SigV4AuthProvider())
.build();
7.
Security & IAM
- IAM Authentication
replaces Cassandra user roles.
- Policies control which tables/queries users can run.
- Example IAM policy for read-only access:
{
"Effect": "Allow",
"Action": [
"cassandra:Select"
],
"Resource": "arn:aws:cassandra:us-east-2:accountID:/keyspace/twtechkeyspace/twtech-table/*"
}
8.
Monitoring & Observability
- CloudWatch metrics
for RCU/WCU usage, latency, and throttling.
- CloudTrail logs for API activity.
- Performance tuning via capacity mode & partition design.
9. Best Practices
- Partition key design matters
- Avoid hot partitions (single partition getting all
writes).
- Distribute keys evenly.
- Model queries, not data
- Like Cassandra, denormalize for query patterns.
- Use TTLs for expiring data
4. INSERT
INTO sessions (session_id, user_id, data)
5. VALUES
(uuid(), '123', 'abc') USING TTL 3600;
- Choose right capacity mode
- On-demand: spiky traffic.
- Provisioned: steady workloads.
- Leverage IAM
instead of application-managed credentials.
10.
Common Use Cases
- IoT time-series data
(high write throughput, TTL expiry).
- User profiles & personalization (low latency).
- Session stores (ephemeral storage with TTL).
- Message/event logging.
- Gaming leaderboards.
11.
Keyspaces vs. Self-managed Cassandra vs.
DynamoDB
|
Feature |
Amazon
Keyspaces |
Self-managed
Cassandra |
DynamoDB |
|
|
Management |
Fully managed |
twtech manages |
Fully managed |
|
|
Scaling |
Auto / provisioned |
Manual |
Auto / provisioned |
|
|
API |
CQL (Cassandra) |
CQL |
Proprietary |
|
|
Ecosystem |
Cassandra-compatible |
Native Cassandra |
AWS only |
|
|
Multi-AZ |
Built-in |
Complex setup |
Built-in |
|
Final thoughts:
- Amazon Keyspaces gives twtech Cassandra power without Cassandra pain.
- Amazon Keyspaces is serverless, scalable, secure, and compatible.
- Amazon Keyspaces is Perfect if twtech want:
- Low-latency,
- High-throughput workloads
- But don’t want to manage Cassandra clusters.
No comments:
Post a Comment