The Question
FE DesignCollaborative Rich-Text Editor Design
Design a real-time collaborative document editor similar to Google Docs or Notion. Your solution should handle multi-user concurrency, conflict resolution, and presence indicators (cursors/typing status). Focus on the frontend architecture: how do you manage the editor state, ensure low-latency local feedback, and synchronize changes efficiently over the network? Address performance considerations for large documents and provide a strategy for offline support and data consistency.
React
CRDT
Yjs
ProseMirror
WebSockets
IndexedDB
Web Workers
Tiptap
Questions & Insights
Clarifying Questions
Q1: What is the primary content type (Rich Text, Code, or Graphic Canvas)?
Assumption: Rich text with basic formatting (Bold, Italic, Lists) similar to a simplified Google Docs.
Q2: What is the expected scale for concurrent collaborators per document?
Assumption: Up to 20 concurrent users per document for the MVP.
Q3: Is offline support a requirement for the MVP?
Assumption: Yes, basic offline editing with "re-sync on reconnect" is essential for a modern collaborative experience.
Q4: What conflict resolution strategy is preferred: Operational Transformation (OT) or Conflict-free Replicated Data Types (CRDT)?
Assumption: CRDT (using a library like Yjs) for easier decentralized scaling and better offline support without complex server-side state.
Crash Strategy
Core Bottleneck: Keeping the UI responsive while reconciling concurrent, potentially conflicting edits from multiple users.
Progressive Architecture Flow:
Local Loop: How do we ensure the editor feels instantaneous for the local user? (Optimistic updates via local CRDT).
Sync Loop: How do we propagate changes to others without freezing the UI? (Binary diffs over WebSockets).
Awareness Loop: How do we show where others are? (Presence/Cursor tracking).
Persistence Loop: How do we ensure work isn't lost on refresh or disconnect? (IndexedDB caching).
Elite Bonus Points
Selective Sync/Virtualization: For massive documents, only syncing and rendering the "viewport" and surrounding paragraphs.
Snapshotting: Implementing a "History" view using CRDT state snapshots to revert to previous versions without bloating the document size.
Web Workers for CRDT: Moving the heavy computation of merging remote document updates off the main UI thread to prevent frame drops during high-activity bursts.
Design Breakdown
Requirements
Functional Requirements:
Real-time text editing with multiple users.
Presence indicators (Remote cursors with names/colors).
Basic rich text formatting (Bold, Italic, Lists).
Document auto-saving.
Non-Functional Requirements:
Latency: Local characters must render in < 16ms (60fps). Remote updates visible in < 500ms.
Consistency: Eventual consistency (everyone sees the same doc eventually).
Scalability: Handle documents up to 100k characters without UI lag.
Availability: Offline editing capability.
Design Summary
Concise Summary: A CRDT-powered rich text editor using a shared-state architecture where the local client acts as a peer in a distributed system, syncing binary updates over WebSockets.
Major Components:
Editor Core: A controlled rich-text surface (ProseMirror/Tiptap) that maps DOM events to document operations.
CRDT Engine: A headless document model (Yjs) that manages the shared data structure and conflict resolution.
Collaboration Provider: A network abstraction layer (Hocuspocus/WebSockets) that broadcasts local changes and receives remote ones.
Presence Manager: A transient state store for tracking non-persistent data like cursor positions and "User is typing" statuses.
CUJ Walkthrough: A user types a character; the Editor Core captures the input and notifies the CRDT Engine. The engine updates the local model (instant UI update) and produces a binary diff. The Collaboration Provider sends this diff via WebSockets. Other clients receive the diff, the CRDT Engine merges it, and the Editor Core re-renders the changed fragment.
Simplicity Audit: By choosing CRDT (Yjs) over OT, we eliminate the need for a complex, stateful "Central Authority" server that must sequence every single operation, significantly reducing backend complexity for the MVP.
Architecture Decision Rationale:
Why CRDT?: It allows for decentralized conflict resolution and works natively with IndexedDB for offline-first capabilities.
Requirement Satisfaction: The decoupled "Editor Core" ensures high performance, while the "CRDT Engine" ensures eventual consistency across all clients.
System Diagram
Architecture Deep Dive
Presentation Layer
Component Hierarchy: The App Shell handles top-level routing and theme. The Editor Layout manages the sidebars. The Editor Page coordinates the Toolbar, Collaborator List, and the Editor Canvas. The is the most critical leaf, rendering the text and overlaying Remote Cursors.
Interaction Layer: We use a "controlled component" pattern. User keystrokes are intercepted. We use
beforeinput events to prevent the browser's default contenteditable behavior, ensuring the CRDT Document Model remains the single source of truth.Rendering Layer: We use ProseMirror for the rendering engine. It uses a virtual DOM-like approach for the document schema, only re-rendering changed nodes (paragraphs/marks) to keep performance high even in large documents.
UI Frameworks / Tools: React for the UI wrapper, Tiptap/ProseMirror for the editor surface, and Tailwind CSS for styling.
Application Layer
Data Fetching Layer: Initial document load fetches the binary CRDT state from the server. Subsequent updates are pushed/pulled via WebSockets.
State Management Layer: We distinguish between Persistent State (the document content) and Transient State (cursor positions). Document state is managed by the CRDT Engine, while Presence is managed in a lightweight store.
Routing & Navigation: URL-based document IDs (
/edit/:docId). Navigation guards ensure the user is authenticated before opening the WebSocket connection.Domain Layer
Business Rules: Validation rules (e.g., max document size, allowed HTML tags) are encapsulated here.
Entities / Models: The core entity is the Y.Doc (CRDT object). We map this complex tree structure into the Editor's internal schema via a binding layer.
Inter-layer Contracts: The Collaboration Engine provides an interface (e.g.,
updateContent(), onRemoteUpdate()) that hides the underlying CRDT implementation from the UI components.Infrastructure Layer
API / Network: WebSockets provide the low-latency full-duplex communication needed for real-time feel. We use binary encoding (Protocol Buffers or Yjs native encoding) to minimize payload size.
Storage: IndexedDB is used to store the binary doc state locally. When the user reopens a document, it loads instantly from IndexedDB while the WebSocket fetches the "delta" from the server.
Wrap Up
Wrap-up
Evaluation: This design prioritizes responsiveness and offline-first capabilities using CRDTs.
Trade-offs: CRDTs can consume more memory over time as they store metadata for deleted items (tombstones). For an MVP, this is acceptable, but long-term, a "GC/Garbage Collection" or "Snapshotting" strategy would be required.
Future Optimization: Implementing "Awareness" via WebRTC for peer-to-peer cursor syncing to reduce server load for transient data.