The Question
DesignCollaborative Real-time Document Editor
Design a collaborative document editing system similar to Google Docs. The system must support real-time concurrent editing for multiple users, preserve document history, and handle conflict resolution seamlessly. Ensure the architecture can scale to millions of users while maintaining low latency for a smooth user experience. Discuss the trade-offs between different consistency models and how you would handle high-concurrency hotspots on a single document.
WebSockets
Operational Transformation
Redis
Kafka
Cassandra
PostgreSQL
S3
WebAssembly
Questions & Insights
Clarifying Questions
What is the scale of the system? (Assumption: 100M Daily Active Users (DAU), with up to 100 concurrent editors on a single high-traffic document).
Is offline editing required for the MVP? (Assumption: No, the MVP focuses on real-time collaborative editing while online).
What types of content need to be supported? (Assumption: Plain text and rich text formatting for the MVP; images and embedded objects are out of scope).
How long should the document history be preserved? (Assumption: Full revision history is required for auditing and "undo" to any point in time).
Thinking Process
The Core Conflict Resolution Problem: How do we handle two users typing in the same spot at the same time? We use Operational Transformation (OT) as the central synchronization logic to ensure all clients converge to the same state.
Real-Time Communication: How do we push updates instantly? We use persistent WebSockets to minimize overhead and latency compared to long-polling.
State Management: How do we handle the "Single Source of Truth"? The server acts as the sequencer, assigning a version number to every operation to maintain a linear history.
Architecture Flow: Client -> WebSocket Gateway -> Session Service (OT Logic) -> Document Store.
Bonus Points
WASM for Editor Logic: Running the OT engine in WebAssembly on the client to ensure high-performance text manipulation and consistent logic between the browser and the backend (if written in C++/Rust).
Differential Synchronization: Implementing a backup sync mechanism (e.g., Neil Fraser's algorithm) to recover state if a client's WebSocket connection drops and misses a burst of operations.
Read-Optimized Snapshots: Storing periodic snapshots (every 100 operations) in an Object Store to avoid replaying millions of operations from "Version 0" when a new user joins a large doc.
Conflict-Free Replicated Data Types (CRDTs): Discussion on why OT was chosen over CRDT (OT is more memory-efficient for text; CRDT is better for decentralized/P2P systems but has higher metadata overhead).
Design Breakdown
Functional Requirements
Core Use Cases:
Create, view, and edit documents.
Real-time collaborative editing (multiple users see each other's changes instantly).
Presence tracking (see who else is currently viewing/editing).
Version history and "Undo" functionality.
Scope Control:
In-Scope: Text editing, concurrency control, basic permissions.
Out-of-Scope: Comments, suggestions mode, image/video embedding, export to PDF/Word.
Non-Functional Requirements
Scale: Support 100M DAU and 1B total documents.
Latency: Sub-100ms local echo (instant UI) and <500ms propagation to other collaborators.
Availability & Reliability: 99.99% uptime; document data must never be lost (Durability).
Consistency: Eventual consistency across all clients; strong consistency for the server-side sequence of operations.
Security: Granular access control (ACLs) for document sharing.
Estimation
Traffic:
100M DAU. If 10% are active at any second = 10M concurrent users.
Average user generates 1 operation (keystroke/format) every 2 seconds = 5M Write QPS.
Storage:
1B docs * 100KB avg size = 100TB for current doc state.
Ops log: 1B docs 5000 edits/doc 100 bytes/op = 500TB.
Bandwidth:
Inbound: 5M ops/sec * 100 bytes = 500MB/s.
Outbound: (Fan-out effect) If avg doc has 2 editors, outbound = 1GB/s.
Blueprint
Concise Summary: A WebSocket-based architecture using a central Session Service to perform Operational Transformation (OT) and sequence document edits into a persistent log.
Major Components:
Load Balancer: Distributes WebSocket connections to the Gateway.
WebSocket Gateway: Maintains persistent stateful connections with clients.
Session Service: The "Brain" that resolves conflicts using OT and manages doc-specific locks/queues.
Document Service: Handles CRUD operations, metadata, and permissions via REST.
Relational DB: Stores document metadata and permissions.
NoSQL Store: Stores the linear log of operations (edits) and snapshots.
Redis: Stores transient presence data (who is in which doc).
Simplicity Audit: This design avoids the complexity of decentralized CRDTs by using a central sequencer (Session Service), which is easier to debug and more storage-efficient.
Architecture Decision Rationale:
Why this architecture?: A centralized OT approach is the industry standard for rich-text collaboration because it ensures a single, authoritative timeline of changes.
Functional Satisfaction: WebSockets provide the "real-time" feel, while the Ops Log provides history.
Non-functional Satisfaction: Scalability is achieved by sharding the Session Service by
document_id.High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing: CDN used for the static JS/CSS editor assets. Global DNS routes users to the nearest regional data center to minimize WebSocket RTT.
Security & Perimeter: TLS termination at the Load Balancer. API Gateway handles JWT validation for REST requests (creating/deleting docs).
Service
Topology & Scaling:
Session Service: Stateful. Users editing the same
doc_id must be routed to the same service instance (sticky sessions or consistent hashing on doc_id).Document Service: Stateless. Scales horizontally based on QPS.
API Schema Design:
WebSocket Message:
{"type": "OP", "doc_id": "123", "v": 45, "op": {"retain": 5, "insert": "hello"}}GET /doc/{id}: Returns current doc snapshot and version.
POST /doc: Create new document.
Resilience & Reliability:
Sequence Numbering: Every edit has a version. If a client sends version 45 but the server is at 47, the server triggers a "rebase" on the client.
Observability: Track "Conflict Rate" (how often OT transformations are needed) and "Sync Latency".
Storage
Access Pattern:
Ops Log: High write volume (append-only).
Snapshot Store: Read when a user first opens a doc.
Database Table Design:
Metadata DB (PostgreSQL):
doc_id (PK), owner_id, title, created_at, permissions (JSONB).Ops Log (Cassandra/DynamoDB): Partition Key:
doc_id, Sort Key: version_number. Stores the transformation data.Technical Selection:
Cassandra: Ideal for the Ops Log because of its high-write throughput and efficient sequential reads for a specific
doc_id.Cache
Purpose & Justification: Presence tracking (mouse cursors and active users). Low latency is more important than durability for cursor positions.
Key-Value Schema:
presence:{doc_id} -> Set of {user_id, last_active, cursor_pos}.Technical Selection: Redis using Pub/Sub or Sorted Sets.
Messaging
Purpose & Decoupling: Decouples the real-time edit path from the heavy persistence path.
Event / Topic Schema:
doc-edits topic. Partitioned by doc_id to ensure edits for the same document are processed in order by the Persistence Worker.Technical Selection: Kafka. High throughput and provides a buffer if the database slows down.
Data Processing
Processing Model: A Persistence Worker consumes from Kafka.
Logic: It aggregates small edits. Every X edits, it computes a new full-text snapshot and writes it to the Snapshot Store (S3 or Blob Storage) to speed up future document loads.
Wrap Up
Advanced Topics
Trade-offs: We chose OT over CRDT. OT is simpler for a central-server model but requires the server to be the "Sequencer." If the Session Service node for a specific doc fails, all users on that doc must reconnect and elect a new leader node.
Reliability: If the WebSocket connection dies, the client caches its local changes. Upon reconnect, it sends its last known version. The server then sends all missed ops, and the client performs a local "catch-up" OT.
Bottleneck Analysis: A single document being edited by thousands of people simultaneously (e.g., a viral live-streamed doc). Optimization: The system can switch to "Read-only" or "Throttled" mode for viewers, only allowing a subset of "VIP" editors to send operations.
Security: Every operation sent via WebSocket must be validated against the
Metadata DB or a cached version of the ACL to ensure the user_id has write access.