Apache Cassandra
Cheat Sheet
Prime Use Case
When you need linear horizontal scalability for write-heavy workloads across multiple data centers and can tolerate eventual consistency.
Critical Tradeoffs
- Optimized for writes at the expense of complex read patterns
- Eventual consistency vs. Strong consistency (Tunable)
- No support for joins or multi-row transactions
- High operational overhead regarding compaction and repair
Killer Senior Insight
Cassandra is essentially a distributed hash map where every node is equal; its 'Query-First' data modeling requirement means you must design your tables specifically to satisfy your UI/API queries, not to normalize your data.
Recognition
Common Interview Phrases
Common Scenarios
- IoT sensor data ingestion
- User activity tracking and analytics
- Messaging and chat history
- Recommendation engine feature stores
- E-commerce shopping carts and session management
Anti-patterns to Avoid
- Applications requiring ACID transactions across multiple tables
- Systems needing ad-hoc reporting or complex SQL joins
- Small datasets that fit comfortably on a single relational instance
- Workloads with frequent updates or deletes to the same records (Tombstone issues)
The Problem
The Fundamental Issue
The 'Write Wall' and Single Point of Failure (SPOF) inherent in traditional master-slave relational databases.
What breaks without it
Master nodes become a bottleneck for write operations
Failover mechanisms introduce downtime during leader election
Vertical scaling hits a hard physical limit and becomes exponentially expensive
Why alternatives fail
Relational DBs (Postgres/MySQL) struggle with multi-master write synchronization
MongoDB's single-master architecture (per shard) can lead to write unavailability during elections
Standard caches (Redis) don't provide the same durability or disk-based storage capacity
Mental Model
The Intuition
Imagine a circular table where every guest is equally responsible for holding a piece of a giant encyclopedia. If one guest leaves, their neighbors already have a copy of their pages. To find a fact, you just need to know which guest's name starts with the right letter.
Key Mechanics
Consistent Hashing: Determines data placement across the ring using partition keys
Gossip Protocol: Peer-to-peer communication for node state and health discovery
LSM-Trees (Log-Structured Merge-Trees): Converts random writes into sequential disk I/O via Memtables and SSTables
Tunable Consistency: Allows developers to choose (R + W > N) for strong consistency or lower for performance
Hinted Handoff: Temporarily stores writes for a downed node to ensure eventual consistency
Framework
When it's the best choice
- When the write-to-read ratio is high
- When zero-downtime is a hard requirement
- When data volume is expected to grow into the Petabyte range
When to avoid
- When you need to perform 'GROUP BY' or 'JOIN' operations on the fly
- When your data access patterns are unpredictable
- When you have a low-volume, high-complexity relational schema
Fast Heuristics
Tradeoffs
Strengths
- Linear horizontal scalability (double nodes = double throughput)
- No single point of failure (Peer-to-peer architecture)
- High write performance due to append-only storage engine
- Flexible schema for wide-column attributes
Weaknesses
- Read latency can be high due to checking multiple SSTables
- Tombstones (deleted markers) can degrade performance if not managed
- No native support for secondary indexes at scale
- Requires deep knowledge of data modeling to avoid 'hot partitions'
Alternatives
When it wins
When you need maximum performance with lower hardware footprint (C++ rewrite of Cassandra)
Key Difference
Shared-nothing architecture that avoids JVM garbage collection pauses
When it wins
When you want a fully managed serverless experience on AWS
Key Difference
Proprietary AWS service with auto-scaling but less control over internal configuration
When it wins
When you need horizontal scale but require full SQL and ACID compliance
Key Difference
Uses Raft consensus for strong consistency, which adds latency to writes compared to Cassandra's AP focus
Execution
Must-hit talking points
- Explain the Partition Key vs. Clustering Key distinction clearly
- Mention 'LSM-Trees' and why they make writes fast (sequential vs random I/O)
- Discuss 'Quorum' (RF=3, R=2, W=2) to demonstrate understanding of consistency trade-offs
- Highlight the 'Compaction' process and its impact on disk space and I/O
Anticipate follow-ups
- Q:How do you handle hot partitions? (Salting or better key selection)
- Q:What happens during a network partition? (CAP theorem: it chooses AP)
- Q:How do you handle deletes in a distributed system? (Tombstones and Grace Period)
- Q:How does Cassandra handle multi-DC replication?
Red Flags
Using Cassandra like a relational database (Normalizing data)
Why it fails: Leads to client-side joins which are extremely slow and negate the benefits of the distributed system.
Selecting a low-cardinality partition key
Why it fails: Creates 'Hot Partitions' where one node handles all the traffic while others stay idle, leading to system-wide bottlenecks.
Relying heavily on Secondary Indexes
Why it fails: Secondary indexes in Cassandra are local to the node; querying them requires hitting every node in the cluster (scatter-gather), which kills performance.