Cassandra

Apache Cassandra is a highly scalable, high-performance, distributed NoSQL database designed to handle massive amounts of data across many commodity servers, providing high availability with no single point of failure.

Cheat Sheet

Prime Use Case

Use Cassandra when you have massive write-heavy workloads, require linear scalability, and need multi-region availability where 'always-on' write capability is more critical than immediate global consistency.

Critical Tradeoffs

AP over CP (Availability/Partition Tolerance over Consistency)
Write-optimized (LSM-trees) vs Read-latency (Compaction overhead)
Query-first modeling vs Flexible relational modeling
Eventual consistency vs Strong consistency (Tunable)

Killer Senior Insight

Cassandra isn't just a database; it's a distributed storage engine that forces you to design your data schema around your specific query patterns (Query-Driven Modeling) rather than your data relationships.

Recognition

Common Interview Phrases

Requirements for 'Infinite' horizontal scalability

High-velocity time-series data (IoT, metrics, logs)

Global multi-datacenter replication with low-latency local writes

Need for high availability where even a few seconds of downtime is unacceptable

Common Scenarios

Activity feeds and social media timelines
IoT sensor data ingestion and monitoring
E-commerce product catalogs and shopping carts
Messaging systems and chat history storage

Anti-patterns to Avoid

Applications requiring ACID transactions across multiple tables
Small datasets that fit on a single large relational instance
Use cases requiring frequent ad-hoc queries or complex JOINs
Workloads with high update/delete frequency (leading to tombstone issues)

The Problem

The Fundamental Issue

The 'Write Wall' and Single Point of Failure inherent in traditional RDBMS master-slave architectures when scaling to petabytes of data.

What breaks without it

Master nodes become a bottleneck for write throughput

Manual sharding of SQL databases becomes an operational nightmare

Failover mechanisms in RDBMS often lead to minutes of downtime

Cross-region replication latency kills write performance in CP systems

Why alternatives fail

Relational databases (PostgreSQL/MySQL) struggle with horizontal write scaling without complex middleware

MongoDB (in default configurations) prioritizes consistency, which can lead to write unavailability during leader elections

Standard Key-Value stores lack the structured 'Wide-Column' querying capabilities needed for complex time-series data

Mental Model

The Intuition

Imagine a circular ring of lockers. Instead of one manager holding all the keys, every locker owner knows the 'Gossip' about who owns which locker. When you want to store something, you can hand it to any owner, and they'll make sure it gets to the right locker and its neighbors for safekeeping.

Key Mechanics

Consistent Hashing: Determines data placement across the cluster ring

Gossip Protocol: Peer-to-peer communication for cluster state and health

LSM-Trees (Log-Structured Merge-Trees): Converts random writes into sequential I/O via Memtables and SSTables

Bloom Filters: Probabilistic data structures used to skip SSTables that don't contain a specific key

Hinted Handoff: Temporarily stores writes for a downed node to ensure eventual consistency

Framework

When it's the best choice

When write volume exceeds 100k+ operations per second
When the system must survive the loss of an entire data center
When data access patterns are well-defined and static

When to avoid

When you need to perform 'GROUP BY' or 'JOIN' operations at the database level
When you have a 'heavy update' workload that modifies the same records repeatedly
When you lack the engineering resources to manage compaction and JVM tuning

Fast Heuristics

If you need Joins

PostgreSQL/Spanner

If you need sub-millisecond Key-Value lookups

Redis

If you need massive writes + multi-region

Cassandra

If you need flexible document schemas + secondary indexes

MongoDB

Tradeoffs

Strengths

Linear Scalability: Adding nodes increases capacity linearly
No Single Point of Failure: Peer-to-peer architecture
Tunable Consistency: Choose between ONE, QUORUM, or ALL for each query
High Write Throughput: LSM-tree architecture is optimized for sequential disk writes

−

Weaknesses

Tombstones: Deletes don't remove data immediately, causing read performance degradation
Compaction Debt: Background merging of SSTables can consume significant CPU/IO
No Joins/Aggregations: Must be handled at the application layer or via denormalization
JVM Garbage Collection: Can cause 'stop-the-world' pauses affecting latency

Alternatives

ScyllaDB

Alternative

When it wins

When you need Cassandra compatibility but want higher performance and lower latency without JVM overhead.

Key Difference

Written in C++ with a shared-nothing architecture, eliminating GC pauses and improving resource utilization.

DynamoDB

Alternative

When it wins

When you want a serverless, fully managed experience and don't want to manage infrastructure.

Key Difference

Proprietary AWS service with a different pricing model (RCU/WCU) and stricter item size limits.

Google Cloud Spanner

Alternative

When it wins

When you need global scale but also require strict ACID transactions and relational semantics.

Key Difference

Uses TrueTime (atomic clocks) to provide external consistency across regions, which is much more expensive.

Execution

Must-hit talking points

Explain the difference between Partition Key (data distribution) and Clustering Key (on-disk sorting)
Discuss the 'Read Repair' and 'Anti-Entropy (Manual Repair)' mechanisms
Mention the 'Write Path': CommitLog -> Memtable -> SSTable
Explain how Quorum (R + W > N) ensures strong consistency

Anticipate follow-ups

Q:How do you handle hot partitions in Cassandra?
Q:What is the impact of a high 'Replication Factor' on write latency?
Q:How would you implement a secondary index, and why is it often discouraged?
Q:Explain the 'LWT' (Lightweight Transactions) and the Paxos protocol in Cassandra.

Red Flags

Using Cassandra like a relational database (Normalizing data).

Why it fails: Leads to application-side joins and massive performance bottlenecks because Cassandra is designed for denormalized, single-table queries.

Creating 'Unbounded' partitions (e.g., partitioning by a single 'status' column).

Why it fails: Causes 'Hot Partitions' where a single node holds too much data, leading to skewed load and eventual node failure.

Frequent deletes or updates to the same row.

Why it fails: Creates 'Tombstones' which the read scanner must skip, significantly increasing read latency and disk pressure.