ScyllaDB

A high-performance, C++ rewrite of Apache Cassandra utilizing a shared-nothing, shard-per-core architecture to maximize modern hardware utilization.

Cheat Sheet

Prime Use Case

When you require sub-millisecond latencies at petabyte scale and need to eliminate the non-deterministic 'Stop-the-World' GC pauses inherent in JVM-based NoSQL systems.

Critical Tradeoffs

  • Extreme performance vs. high operational complexity
  • Predictable P99s vs. strict data modeling requirements
  • High hardware efficiency vs. lack of managed service ubiquity compared to DynamoDB

Killer Senior Insight

ScyllaDB isn't just 'fast Cassandra'; it's a specialized operating system for data that uses the Seastar framework to bypass the Linux kernel's scheduler and page cache, treating each CPU core as an independent execution unit.

Recognition

Common Interview Phrases

We need to handle 1M+ writes per second with low TCO
P99 latency must stay under 10ms regardless of background tasks
We are hitting JVM GC bottlenecks in our current Cassandra or HBase cluster

Common Scenarios

  • Real-time bidding (RTB) in AdTech
  • High-velocity IoT sensor data ingestion
  • User profile stores for massive-scale social networks
  • Fraud detection systems requiring millisecond lookups

Anti-patterns to Avoid

  • Small datasets (< 1TB) where the overhead of sharding outweighs the benefits
  • Applications requiring complex JOINs or ACID transactions across multiple tables
  • Prototyping phases where developer velocity is more important than performance

The Problem

The Fundamental Issue

The 'In-Memory Bottleneck' and CPU inefficiency caused by thread contention, context switching, and Garbage Collection in traditional NoSQL systems.

What breaks without it

Unpredictable latency spikes (P99/P999) due to Java GC cycles

Underutilized multi-core CPUs due to global lock contention

Higher TCO because more nodes are needed to reach performance targets due to software overhead

Why alternatives fail

Cassandra's JVM architecture introduces non-deterministic pauses that are impossible to tune away at extreme scale

DynamoDB can become prohibitively expensive at extreme sustained throughput due to its pricing model

Relational DBs cannot scale horizontally for writes without complex, manual application-level sharding

Mental Model

The Intuition

Imagine a massive warehouse where instead of one giant door where every worker crowds and waits for a manager to clear the path, every single worker has their own private door, their own forklift, and their own section of the shelf. They never have to talk to or wait for each other.

Key Mechanics

1

Shard-per-core architecture: Each CPU core manages its own memory and NIC queues

2

Seastar engine: An asynchronous, non-blocking programming framework

3

Direct I/O: Bypassing the Linux Page Cache to avoid kernel-level locking

4

Autonomous self-optimizing background tasks: Dynamic controllers for compaction and repair

Framework

When it's the best choice

  • High-throughput write-heavy workloads
  • Predictable low-latency requirements at scale
  • Large-scale Cassandra migrations where API compatibility is required

When to avoid

  • Strong consistency requirements (ACID) across multiple rows
  • Rapidly changing schemas that require frequent full-table scans
  • Low-budget projects where managed serverless services suffice

Fast Heuristics

If the bottleneck is CPU/GC
ScyllaDB
If the bottleneck is Developer Velocity
DynamoDB
If the bottleneck is Complex Queries
PostgreSQL

Tradeoffs

+

Strengths

  • Up to 10x throughput compared to Apache Cassandra
  • Consistent low latency (no GC pauses)
  • Lower TCO due to higher data density per node
  • Drop-in compatibility with Cassandra (CQL) and DynamoDB APIs

Weaknesses

  • Steep learning curve for partition-key-centric data modeling
  • Compaction strategy tuning is critical and can be complex
  • C++ memory management makes debugging 'weird' crashes harder than Java-based systems

Alternatives

Apache Cassandra
Alternative

When it wins

When the team has deep existing Java expertise and hasn't hit performance ceilings.

Key Difference

JVM-based, thread-per-request model vs. Scylla's shard-per-core C++ model.

Amazon DynamoDB
Alternative

When it wins

When operational overhead must be zero and the workload is highly bursty.

Key Difference

Fully managed serverless vs. Scylla's cluster-based architecture.

Aerospike
Alternative

When it wins

When the primary use case is a pure Key-Value store with extreme focus on Flash/NVMe optimization.

Key Difference

Optimized for KV lookups rather than the wide-column/CQL flexibility of Scylla.

Execution

Must-hit talking points

  • Mention 'Shard-per-core' architecture and how it eliminates lock contention
  • Discuss the 'Seastar' framework and its shared-nothing approach
  • Explain why bypassing the Linux Page Cache (Direct I/O) is critical for NVMe performance
  • Differentiate between Compaction Strategies (STCS vs LCS) and their impact on write amplification

Anticipate follow-ups

  • Q:How does Scylla handle 'Hot Partitions' through its per-core scheduler?
  • Q:Explain the trade-offs of Lightweight Transactions (LWT) using Paxos or Raft
  • Q:How do you handle cross-datacenter replication and the CAP theorem implications?

Red Flags

Using 'ALLOW FILTERING' in production queries.

Why it fails: It forces a full cluster scan, defeating the purpose of partition-key-based indexing and causing massive latency spikes.

Ignoring the Partition Key design and creating 'fat' partitions.

Why it fails: Poorly chosen keys lead to hot shards, where one CPU core is pegged at 100% while others are idle, creating a system-wide bottleneck.