The Question

Scalable Cloud File Storage and Synchronization System

Design a high-performance, scalable file storage service similar to Google Drive or Dropbox. The system must support millions of users, handle files up to 5GB in size, and ensure cross-device synchronization. Key focus areas should include efficient handling of large file uploads over unreliable networks, a robust metadata management strategy that ensures data integrity for shared folders, and a notification mechanism for real-time sync updates. Discuss your choices for storage durability, consistency models for metadata, and how you would optimize for both storage costs and global latency.

PostgreSQL

Redis

SQS

Kubernetes

CDN

AES-256

TLS

SHA-256

WebSockets

Questions & Insights

Clarifying Questions

What is the scale of the system? (Assumption: 10M DAU, 100M total registered users. We need to support petabytes of data).

What is the maximum file size? (Assumption: 5GB per file).

What are the core features for the MVP? (Assumption: File upload, download, basic file sharing/permissions, and file versioning).

Is real-time collaboration (like Google Docs) in scope? (Assumption: No. Only file-level storage and syncing for the MVP).

What are the consistency requirements? (Assumption: Strong consistency for metadata—users shouldn't see old file names after an update—and eventual consistency for file content propagation across regions).

Thinking Process

Core Bottleneck: Handling large file uploads over unreliable networks and managing massive metadata growth.

Progressive Logic:

How do we handle large files without timing out? (Answer: Chunking and Resumable Uploads).

How do we store and query millions of file metadata records efficiently? (Answer: Sharded Relational DB for consistency).

How do we notify clients of updates for syncing? (Answer: Long Polling or WebSockets).

How do we ensure high availability and durability? (Answer: Multi-AZ Object Storage and Metadata replication).

Bonus Points

Delta Sync: Instead of uploading the whole file, use a rolling checksum algorithm (like rsync) to upload only modified blocks.

Content-Addressable Storage: Use SHA-256 hashes of chunks as keys in S3 to achieve global de-duplication across all users, saving massive storage costs.

Pre-signed URLs: Offload the heavy lifting of data transfer from the application servers to the Storage Layer (S3) directly to save compute costs and improve latency.

Vector Clock Versioning: Using conflict resolution strategies for offline edits to handle merge conflicts gracefully.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can upload and download files from any device.

Users can share files/folders with other users (view/edit permissions).

Automatic syncing of files across multiple devices.

View file revision history (versioning).

Scope Control:

In-scope: File storage, sharing, syncing, versioning.

Out-of-scope: Real-time concurrent editing (Google Docs style), full-text search within document content, sophisticated trash recovery.

Non-Functional Requirements

Scale: Support 10M DAU and 100PB+ of total storage.

Latency: Fast upload/download (near-line speed); metadata updates < 200ms.

Availability & Reliability: 99.99% availability; 99.999999999% (11 9s) durability for files.

Consistency: Strong consistency for file metadata and permissions.

Fault Tolerance: No single point of failure; ability to resume uploads after network interruption.

Security: Data encryption at rest (AES-256) and in transit (TLS).

Estimation

Traffic Estimation:

10M DAU. 2 files uploaded per user per day = 20M uploads/day.

Avg QPS: 20M / 86,400

\approx

230 Write QPS.

Peak QPS (3x): ~700 Write QPS.

Read QPS (Downloads/Metadata sync): Usually 10x writes = 7,000 QPS.

Storage Estimation:

100M users * 10GB avg = 1,000 PB (1 Exabyte).

Metadata: 100M users 500 files 1KB/metadata = 50TB.

Bandwidth Estimation:

Upload: 20M files * 5MB avg = 100TB/day

\approx

1.15 GB/s.

Blueprint

Concise Summary: A microservices architecture centered around an Object Store for bulk data and a Relational Database for ACID-compliant metadata management, using a "Chunking" strategy for reliable file transfers.

Major Components:

Load Balancer: Distributes traffic to API Gateways.

API Gateway: Handles Auth, Rate Limiting, and Request Routing.

File Service: Manages file upload/download logic and generates pre-signed URLs.

Metadata Service: Manages file hierarchies, versions, and permissions.

Object Storage: Persistently stores file chunks (e.g., AWS S3).

Notification Service: Pushes updates to clients for synchronization.

Simplicity Audit: This design avoids complex peer-to-peer syncing or custom storage engines, relying on industry-standard Object Storage for durability.

Architecture Decision Rationale:

Why this?: Chunking is necessary for 5GB files to allow resumability. S3 provides the highest durability. SQL is chosen for metadata because file/folder structures and permissions require strict consistency and relational integrity.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

CDN: Used for downloading popular public files to reduce origin load.

Global Accelerator: Uses AWS Global Accelerator or similar to route upload traffic over the provider's private backbone rather than the public internet.

Security & Perimeter:

WAF/DDoS: Protects against volumetric attacks.

SSL Termination: Performed at the Load Balancer level.

Service

Topology & Scaling: Stateless microservices deployed in Kubernetes across multiple Availability Zones. Scaling is based on CPU and Request Count.

API Schema Design:

POST /v1/files/upload: Initiates chunked upload. Returns upload_id.

PUT /v1/files/upload/{id}/{chunk_index}: Uploads a specific chunk.

GET /v1/files/download/{file_id}: Returns a pre-signed S3 URL for direct download.

GET /v1/metadata/{file_id}: Fetches file details.

Resilience & Reliability:

Resumable Uploads: If chunk 5 of 10 fails, the client only retries chunk 5 using the upload_id.

Circuit Breakers: Implemented between the API Gateway and downstream services to prevent cascading failures.

Storage

Access Pattern: Metadata is read-heavy (syncing). File data is write-heavy (uploads).

Database Table Design (PostgreSQL):

Files: file_id (PK), name, owner_id, parent_folder_id, is_directory, checksum.

File_Versions: version_id, file_id, s3_path, size, created_at.

User_Permissions: file_id, user_id, permission_type (Read/Write).

Technical Selection: PostgreSQL with logical sharding by owner_id. We need ACID for file moves/renames to prevent orphaned metadata.

Distribution Logic: Sharding by owner_id ensures all files for a single user live on the same shard, making "My Drive" queries extremely fast.

Cache

Purpose & Justification: Reduce load on Metadata DB for frequently accessed file details and session data.

Key-Value Schema:

Key: file_meta:{file_id}, Value: JSON blob of metadata. TTL: 1 hour.

Key: user_root:{user_id}, Value: List of top-level file IDs.

Failure Handling: Cache-aside pattern. If Redis is down, system falls back to the DB (graceful degradation).

Messaging

Purpose & Decoupling: Used for asynchronous post-processing tasks like thumbnail generation, virus scanning, and search indexing.

Event Schema: { "file_id": "123", "action": "UPLOAD_COMPLETE", "s3_path": "..." }.

Technical Selection: AWS SQS or Kafka. SQS is preferred for MVP simplicity and managed scaling.

Data Processing

Processing Model: Worker pool consumes messages from SQS.

Processing DAG:

Receive message -> 2. Virus Scan -> 3. Generate Thumbnail -> 4. Update Search Index -> 5. Mark Metadata as "Processed/Ready".

Technical Selection: Go/Python Workers running as K8s Jobs or Lambda functions.

Infrastructure (Optional)

Observability: Prometheus for metrics (Latency, Error Rates), ELK stack for logs.

Distributed Coordination: Not strictly needed for MVP; if required for locking, use Redis Redlock.

Wrap Up

Advanced Topics

Trade-offs: We chose SQL over NoSQL for metadata. While NoSQL scales better, the hierarchical nature of folders and the need for atomic renames (moving a folder with 1000 files) make SQL's transaction support more valuable for an MVP.

Reliability: S3 provides cross-region replication for disaster recovery. If one region fails, DNS switches to the secondary region.

Bottleneck Analysis: The Metadata DB shard for a "Power User" (millions of files) might become a hotspot. Solution: Sub-sharding or further denormalization.

Security: Using Pre-signed URLs ensures the File Service never touches the actual bytes of the file, reducing the attack surface and processing overhead.

Distinguishing Insights: For mobile clients, use Delta Sync. Instead of checking every file, the client sends the last "Sync Token" it has. The server returns only the changes since that token using a changes table in the DB.