The Question

Scalable Cloud File Synchronization System

Design a globally distributed cloud storage and file synchronization service similar to Dropbox. The system must support millions of users, handle file versioning, and efficiently synchronize changes across multiple devices (Desktop, Mobile, Web). Key challenges include optimizing bandwidth for large file updates, ensuring strong consistency for metadata, and handling high-concurrency synchronization notifications. Discuss your strategy for block-level deduplication, data durability, and conflict resolution in a distributed environment.

PostgreSQL

Redis

Kafka

CDN

JWT

gRPC

SHA-256

HTTP Long Polling

Questions & Insights

Clarifying Questions

Scale and Usage: What is the target user base (e.g., 500M total, 10M DAU) and the average file size/count per user?

File Constraints: Is there a maximum file size limit (e.g., 2GB or 50GB) and do we need to support block-level "delta" synchronization?

Consistency Requirements: Is strong consistency required across all devices for the same account, or is eventual consistency acceptable during concurrent edits?

Versioning: How many historical versions of a file should we retain, and for how long?

Assumptions for MVP:

DAU: 10 Million.

Storage: Total 10 PB+, targeting block-level transfers to optimize bandwidth.

Consistency: Strong consistency for file metadata to prevent sync conflicts.

Features: Upload/download, file synchronization across devices, and basic file sharing.

Thinking Process

The core challenge of a file-sync service is efficiently moving data while maintaining metadata integrity across distributed clients.

How do we optimize bandwidth? Use Block-level chunking. Instead of uploading the whole file, split it into 4MB chunks and only upload modified blocks (Delta Sync).

How do we handle high-frequency metadata updates? Separate the Metadata path (file names, permissions, block maps) from the Data path (raw bytes). Use a dedicated Metadata Service.

How do we ensure all devices stay in sync? Implement a Notification Service using long-polling or WebSockets to alert clients that "something changed," triggering them to pull the latest metadata.

How do we handle massive storage? Use Object Storage (S3-compatible) for blocks and a partitioned Relational DB for metadata to support ACID transactions on file structures.

Bonus Points

Content-Addressable Storage (CAS): Store blocks based on their SHA-256 hash. This enables global deduplication across all users, significantly reducing storage costs if multiple users upload the same file (e.g., a popular OS ISO).

Optimistic Concurrency Control: Use version numbers/ETags in the Metadata DB to handle race conditions when two devices update the same file simultaneously.

Cold Storage Tiering: Automatically move blocks not accessed for 90+ days to S3 Glacier to optimize cost for the MVP's growth.

Offline Sync & Vector Clocks: Discuss how to handle complex merge conflicts when a device has been offline for weeks and returns with conflicting changes.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can upload and download files via a desktop/web client.

Automatic synchronization of files across multiple devices owned by the same user.

File versioning (history of changes).

Share files/folders with other users via links or permissions.

Scope Control:

In-Scope: Block-level sync, metadata management, notification system.

Out-of-Scope: Real-time collaborative editing (like Google Docs), advanced search inside file content, and third-party app ecosystem.

Non-Functional Requirements

Scale: Support 10M DAU and 100M+ total users.

Latency: Low-latency metadata updates; high-throughput for block transfers.

Availability & Reliability: 99.99% availability. Data durability is paramount (no data loss).

Consistency: Strong consistency for the file system view (Metadata).

Security: Data encryption at rest (AES-256) and in transit (TLS).

Estimation

Traffic:

10M DAU 2 devices/user 5 sync events/day = 100M sync requests/day.

~1,200 Avg QPS for metadata.

Peak QPS: ~2,500-3,000.

Storage:

100M users * 10GB avg = 1 Exabyte (long term).

Metadata: 100B files * 500 bytes/metadata = 50TB.

Bandwidth:

If 10% of DAU upload a 1MB delta daily: 10M 0.1 1MB = 1TB/day = ~115Mbps average.

Blueprint

The design follows a "Split-Plane" architecture. The Data Plane handles raw block storage and chunking, while the Control Plane (Metadata) handles the hierarchy, permissions, and synchronization logic.

Major Components:

Block Service: Manages the upload/download of file chunks to Object Storage.

Metadata Service: Manages file versions, block lists, and user folder structures.

Notification Service: Pushes update signals to clients to trigger sync.

Object Storage: Highly durable store for file blocks (e.g., S3).

Simplicity Audit: We use S3 for storage to avoid managing a complex distributed file system ourselves. We use a standard SQL database for metadata to ensure ACID properties for file-tree operations.

Architecture Decision Rationale:

Why this architecture?: Separating metadata from blocks allows us to scale them independently. Metadata is small but high-frequency; blocks are large but handled as immutable objects.

Functional Satisfaction: Chunking handles large files and sync. Notifications handle multi-device updates.

Non-functional Satisfaction: S3 provides 99.999999999% durability. Sharded SQL provides the necessary consistency and scale for metadata.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery: Use CDN for downloading popular shared files to reduce egress costs and latency.

Load Balancing: L7 Load Balancer (NGINX/AWS ALB) for SSL termination and routing based on path (/metadata vs /blocks).

API Gateway: Handles rate limiting per user and authentication via JWT.

Service

Metadata Service:

Stateless service.

API Schema:

GET /v1/metadata/{file_id}: Returns file version and block list.

POST /v1/metadata/update: Updates file metadata and increments version.

Concurrency: Uses optimistic locking (version column) to detect update conflicts.

Block Service:

Handles block uploads. When a client uploads a block, the service calculates the hash. If the hash already exists in the metadata, it returns a success without re-uploading to S3 (Deduplication).

Notification Service:

Uses HTTP Long Polling. Clients hold a request open. When Metadata Service pushes a message to the Sync Queue, the Notification Service picks it up and closes the client's request with the "update" signal.

Storage

Access Pattern: Metadata is read-heavy (sync checks) and write-moderate (file changes). Block storage is write-once, read-many.

Database Table Design (Relational):

Files Table: file_id (PK), namespace_id, file_name, version, checksum, is_directory.

Blocks Table: block_id (PK), file_id (FK), block_order, block_hash, size.

Namespaces Table: namespace_id (PK), owner_id, shared_with (JSON/List).

Technical Selection: PostgreSQL with Citus or manual sharding by namespace_id.

Distribution Logic: Shard by owner_id or namespace_id to ensure all files for a user/shared folder live on the same shard, allowing atomic updates for directory moves.

Cache

Purpose: Reduce load on Metadata DB for frequent "poll for changes" requests.

Key-Value Schema:

Key: user_latest_version:{user_id}, Value: version_number.

TTL: 24 hours.

Technical Selection: Redis.

Failure Handling: If Redis is down, fall back to Metadata DB.

Messaging

Purpose: Decouple metadata updates from notification delivery.

Event Schema: { "user_id": "123", "namespace_id": "abc", "type": "FILE_UPDATE" }.

Technical Selection: Kafka. High throughput and allows replaying sync events if the Notification Service restarts.

Wrap Up

Advanced Topics

Trade-offs (Consistency vs Availability): We choose Consistency (CP in CAP) for the metadata. A user would rather wait for a sync than see a corrupted file tree or lose data due to a write conflict.

Reliability: All blocks are stored in S3, which replicates across 3 Availability Zones. Database backups are taken daily with WAL (Write Ahead Logs) archived for PITR.

Bottleneck Analysis: The Metadata DB is the primary bottleneck. As the number of files grows, we must shard the database. Sharding by namespace_id prevents hot spots for specific users.

Security: Client-side encryption is an alternative but makes "Search" and "Server-side previews" impossible. For MVP, we use server-side encryption with AWS KMS.

Optimization (Delta Sync): By using fixed-size (4MB) or variable-size (Rsync-like) chunking, we ensure that changing 1 byte in a 1GB file only requires uploading 4MB.