Cassandra
Cheat Sheet
Prime Use Case
Use Cassandra when you have massive write-heavy workloads, require linear scalability, and need multi-region availability where 'always-on' write capability is more critical than immediate global consistency.
Critical Tradeoffs
- AP over CP (Availability/Partition Tolerance over Consistency)
- Write-optimized (LSM-trees) vs Read-latency (Compaction overhead)
- Query-first modeling vs Flexible relational modeling
- Eventual consistency vs Strong consistency (Tunable)
Killer Senior Insight
Cassandra isn't just a database; it's a distributed storage engine that forces you to design your data schema around your specific query patterns (Query-Driven Modeling) rather than your data relationships.
Recognition
Common Interview Phrases
Common Scenarios
- Activity feeds and social media timelines
- IoT sensor data ingestion and monitoring
- E-commerce product catalogs and shopping carts
- Messaging systems and chat history storage
Anti-patterns to Avoid
- Applications requiring ACID transactions across multiple tables
- Small datasets that fit on a single large relational instance
- Use cases requiring frequent ad-hoc queries or complex JOINs
- Workloads with high update/delete frequency (leading to tombstone issues)
The Problem
The Fundamental Issue
The 'Write Wall' and Single Point of Failure inherent in traditional RDBMS master-slave architectures when scaling to petabytes of data.
What breaks without it
Master nodes become a bottleneck for write throughput
Manual sharding of SQL databases becomes an operational nightmare
Failover mechanisms in RDBMS often lead to minutes of downtime
Cross-region replication latency kills write performance in CP systems
Why alternatives fail
Relational databases (PostgreSQL/MySQL) struggle with horizontal write scaling without complex middleware
MongoDB (in default configurations) prioritizes consistency, which can lead to write unavailability during leader elections
Standard Key-Value stores lack the structured 'Wide-Column' querying capabilities needed for complex time-series data
Mental Model
The Intuition
Imagine a circular ring of lockers. Instead of one manager holding all the keys, every locker owner knows the 'Gossip' about who owns which locker. When you want to store something, you can hand it to any owner, and they'll make sure it gets to the right locker and its neighbors for safekeeping.
Key Mechanics
Consistent Hashing: Determines data placement across the cluster ring
Gossip Protocol: Peer-to-peer communication for cluster state and health
LSM-Trees (Log-Structured Merge-Trees): Converts random writes into sequential I/O via Memtables and SSTables
Bloom Filters: Probabilistic data structures used to skip SSTables that don't contain a specific key
Hinted Handoff: Temporarily stores writes for a downed node to ensure eventual consistency
Framework
When it's the best choice
- When write volume exceeds 100k+ operations per second
- When the system must survive the loss of an entire data center
- When data access patterns are well-defined and static
When to avoid
- When you need to perform 'GROUP BY' or 'JOIN' operations at the database level
- When you have a 'heavy update' workload that modifies the same records repeatedly
- When you lack the engineering resources to manage compaction and JVM tuning
Fast Heuristics
Tradeoffs
Strengths
- Linear Scalability: Adding nodes increases capacity linearly
- No Single Point of Failure: Peer-to-peer architecture
- Tunable Consistency: Choose between ONE, QUORUM, or ALL for each query
- High Write Throughput: LSM-tree architecture is optimized for sequential disk writes
Weaknesses
- Tombstones: Deletes don't remove data immediately, causing read performance degradation
- Compaction Debt: Background merging of SSTables can consume significant CPU/IO
- No Joins/Aggregations: Must be handled at the application layer or via denormalization
- JVM Garbage Collection: Can cause 'stop-the-world' pauses affecting latency
Alternatives
When it wins
When you need Cassandra compatibility but want higher performance and lower latency without JVM overhead.
Key Difference
Written in C++ with a shared-nothing architecture, eliminating GC pauses and improving resource utilization.
When it wins
When you want a serverless, fully managed experience and don't want to manage infrastructure.
Key Difference
Proprietary AWS service with a different pricing model (RCU/WCU) and stricter item size limits.
When it wins
When you need global scale but also require strict ACID transactions and relational semantics.
Key Difference
Uses TrueTime (atomic clocks) to provide external consistency across regions, which is much more expensive.
Execution
Must-hit talking points
- Explain the difference between Partition Key (data distribution) and Clustering Key (on-disk sorting)
- Discuss the 'Read Repair' and 'Anti-Entropy (Manual Repair)' mechanisms
- Mention the 'Write Path': CommitLog -> Memtable -> SSTable
- Explain how Quorum (R + W > N) ensures strong consistency
Anticipate follow-ups
- Q:How do you handle hot partitions in Cassandra?
- Q:What is the impact of a high 'Replication Factor' on write latency?
- Q:How would you implement a secondary index, and why is it often discouraged?
- Q:Explain the 'LWT' (Lightweight Transactions) and the Paxos protocol in Cassandra.
Red Flags
Using Cassandra like a relational database (Normalizing data).
Why it fails: Leads to application-side joins and massive performance bottlenecks because Cassandra is designed for denormalized, single-table queries.
Creating 'Unbounded' partitions (e.g., partitioning by a single 'status' column).
Why it fails: Causes 'Hot Partitions' where a single node holds too much data, leading to skewed load and eventual node failure.
Frequent deletes or updates to the same row.
Why it fails: Creates 'Tombstones' which the read scanner must skip, significantly increasing read latency and disk pressure.