The Question
Design

Collaborative Document Editing Platform

Design a real-time collaborative document editing platform similar to Google Docs. The system should support concurrent multi-user editing with conflict resolution via operational transformation or CRDTs, maintain a persistent version history, and ensure low-latency synchronization across all collaborators.
PostgreSQL
S3
Redis
WebSocket
Consistent Hashing
Questions & Insights

Thinking Process

The core challenge of Google Docs is concurrency control—ensuring multiple users can edit the same document simultaneously without losing data or seeing inconsistent states.
How do we handle conflicting edits? We use Operational Transformation (OT) for the MVP. It allows the server to act as a "Single Source of Truth" that re-sequences incoming operations and broadcasts transformed edits to all clients.
How do we achieve sub-100ms latency? We utilize WebSockets for full-duplex communication and Optimistic UI updates on the client side to provide an instant feedback loop.
How do we store document state efficiently? We store the Current Snapshot in a relational database and a log of Operations (Ops) to allow for version history and conflict resolution.
How do we scale the real-time layer? We use a Redis Pub/Sub backplane to allow multiple WebSocket server instances to communicate when users are connected to different physical nodes but editing the same document.

Bonus Points

OT vs. CRDT Trade-off: While OT requires a central server to sequence operations, Conflict-free Replicated Data Types (CRDTs) allow for decentralized, peer-to-peer merging but often come with higher memory overhead and implementation complexity. For an MVP, OT is more predictable.
Differential Synchronization: Implementing a "three-way wait" or "shadowing" technique where the client maintains a "server version" and a "local buffer" to ensure no operations are lost during high-latency periods.
Block-based Storage: Instead of storing the entire document as one blob, splitting documents into "blocks" (e.g., paragraphs) to allow for more granular locking and faster partial loading.
Vector Clocks/Lamport Timestamps: Using logical clocks to maintain partial ordering of events in a distributed environment to assist in identifying causality during conflicts.
Design Breakdown

Functional Requirements

Create, view, and edit documents.
Real-time collaborative editing (multiple users see each other's changes instantly).
Presence indication (who else is currently viewing the document).
Persistence (autosave and document history).

Non-Functional Requirements

Low Latency: Latency for seeing an edit should be < 100ms.
High Availability: The system must be available 99.9% of the time.
Consistency: Eventual consistency is acceptable for the view, but the server must ensure a single linear history (Strong Consistency for the sequence of operations).
Scalability: Support up to 10k concurrent users per document (extreme case) and millions of documents.

Estimation

DAU: 1 million active users.
Concurrency: 10% of DAU are active at peak = 100k concurrent connections.
Write Volume: 1 edit every 2 seconds per active user = 50k writes/sec.
Storage: 100 million docs * 100KB average = 10 TB.
Bandwidth: 50k edits/sec * 100 bytes/edit = 5 MB/s (easily handled by modern networks).

Blueprint

Concise Summary: A WebSocket-based architecture using Operational Transformation (OT) where a centralized server sequences all operations and persists them to a relational database.
Major Components:
WebSocket Gateway: Maintains persistent connections for real-time bi-directional communication.
Document Service (OT Engine): The core logic that transforms incoming operations against the current document version.
Redis Pub/Sub: Routes operations between different server nodes to ensure all users on the same doc receive updates.
RDBMS: Stores the canonical document snapshots and the sequential log of operations.
Simplicity Audit: This design avoids complex distributed consensus (like Paxos/Raft) by delegating sequencing to a single "Document Service" instance per document, using Redis to bridge the gap between instances.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling: The WebSocket servers are stateless regarding the document logic; they merely map UserIDs to SocketIDs. The Document Service uses consistent hashing based on DocID to ensure all edits for a specific document are processed by a specific node (minimizing lock contention), or it uses Redis to coordinate if nodes are heterogeneous.
API Spec:
POST /v1/doc: Create a new document.
WS /v1/doc/{id}/join: Establish a real-time session.
Op Message: { type: "insert/delete", pos: 12, char: "a", version: 104 }.

Storage

Data Model:
Documents table: id, title, owner_id, last_snapshot_id.
Operations table: doc_id, version_number, op_data (JSON), timestamp.
Snapshots table: doc_id, version_number, content (Text/Blob).
Database Logic: We use PostgreSQL. Writes are appended to the Operations table. Every ~100 operations, a background job creates a new Snapshot to prevent replaying thousands of operations from the beginning of time.

Cache

Redis:
Pub/Sub: Channels are named by DocID. When an operation is validated, it is published to doc_{id}. All WebSocket nodes subscribed to this channel broadcast the message to their connected clients.
Ephemeral State: Stores "Presence" data (e.g., who is online, cursor positions) with a short TTL (30 seconds).
Wrap Up

Advanced Topics

Trade-offs:
OT vs CRDT: OT requires a central server, creating a potential bottleneck, but ensures a much smaller payload on the wire compared to CRDT metadata.
Strong Consistency for Ops: We sacrifice some availability in a total network partition to ensure that the document versioning never forks (Linearizable history).
Bottlenecks:
Single Document Hotspot: If 1,000 users edit the same document, the OT engine must process them sequentially.
WebSocket Connections: Managing 1M+ idle connections requires significant memory on the Gateway layer.
Failure Handling:
Server Crash: If a Document Service node fails, the consistent hash ring re-assigns the DocID to a new node, which reloads the last snapshot and operations log from PostgreSQL.
Client Disconnect: Clients buffer local changes and attempt to "re-sync" by sending their last acknowledged version number upon reconnection.
Alternatives & Optimization:
Compression: Use Protocol Buffers instead of JSON for the WebSocket messages to reduce bandwidth.
Edge Side Sync: Deploying WebSocket servers at the edge (Cloudflare Workers/AWS Lambda@Edge) to reduce the RTT (Round Trip Time) for global users.