The Question

Scalable Resumable File Upload System

Design a robust frontend system for uploading large files (up to several gigabytes) with real-time progress tracking. The system must handle network interruptions gracefully via resumable uploads, support multiple concurrent file transfers without blocking the main UI thread, and provide a performant queue management interface. Explain your strategies for memory management, file integrity verification, and how you would handle state synchronization for a large number of concurrent tasks.

React

Zustand

Web Workers

IndexedDB

XMLHttpRequest

Blob API

SparkMD5

Questions & Insights

Clarifying Questions

What is the maximum file size and type supported?Assumption: Up to 5GB per file, supporting any binary format. This necessitates chunked uploads.

Should the system support resumable uploads after a network failure or browser refresh?Assumption: Yes. We will use local persistence to store upload metadata for resumption.

Are we handling multiple concurrent uploads?Assumption: Yes, with a configurable concurrency limit (e.g., 3-6 simultaneous uploads) to prevent browser socket exhaustion.

Is file integrity a priority?Assumption: Yes, we will implement client-side hashing (MD5/SHA) to verify chunks on the server.

Crash Strategy

Progress Granularity: For large files, native XHR progress events are insufficient if the connection drops. We will use Chunked Uploads (slicing the file) to provide granular progress and reliability.

Queue Management: How do we manage 50+ files without crashing the UI? We use a Centralized Upload Queue with a worker-like orchestration.

Memory Management: How to handle multi-GB files without OOM (Out of Memory) errors? We use Blob.slice() which is a pointer-based operation and doesn't load the whole file into RAM.

The Core Flow:

Initialize: Request an upload ID and check for existing progress.

Process: Slice file and hash chunks (in Web Workers).

Transmit: Upload chunks concurrently with retry logic.

Finalize: Notify the server to merge chunks and verify total hash.

Elite Bonus Points

Web Workers for Hashing: Use SparkMD5 inside a Worker to prevent UI jank during large file processing.

Tus Protocol Alignment: Design the API interaction to follow the Tus.io open protocol for resumable file uploads.

Network Awareness: Use navigator.connection to dynamically adjust chunk size or pause uploads on "save-data" mode.

Headless Logic: Decouple the upload state machine into a framework-agnostic core, making it testable without a DOM.

Design Breakdown

Requirements

Functional Requirements:

Select/Drag-and-drop multiple files.

Real-time progress bar (per file and aggregate).

Pause, Resume, and Cancel actions.

Error handling with "Retry" capability.

Non-Functional Requirements:

Performance: Zero UI blocking during hashing/slicing; low memory footprint.

Reliability: Resumable from the last successful chunk after a crash.

Scalability: Support a queue of 100+ files efficiently.

Responsiveness: Mobile-friendly progress tracking.

Design Summary

Concise Summary: A chunk-based upload system managed by a centralized queue store, utilizing Web Workers for non-blocking file hashing and XHR for granular progress tracking.

Major Components:

Upload Manager: Orchestrates the queue, concurrency, and global progress state.

Chunking Service: Handles File.slice() logic and creates payload units.

Integrity Hasher: A Web Worker-based service that generates unique identifiers for resumability.

Persistence Store: IndexedDB/LocalStorage to track chunk manifests for offline-resumption.

CUJ Walkthrough: User drops 3 files -> Upload Manager adds them to queue -> Hasher generates IDs -> Manager starts first N files -> Chunking Service sends slices -> UI reflects progress via Queue Store.

Simplicity Audit: This is the simplest robust architecture. While single-stream fetch is easier, it lacks reliable "Resume" and "Progress" for large files on unstable networks, which are core requirements for a "system."

Architecture Decision Rationale:

Why this?: Chunking is the industry standard (S3, Dropbox) for reliability. Using a centralized store (Zustand/Redux) ensures the UI stays synced with background upload tasks.

Requirement Satisfaction: Meets all functional needs; Web Workers ensure the "Performance" requirement is met by offloading CPU-heavy hashing.

System Diagram

Architecture Deep Dive

Presentation Layer

Component Hierarchy: The App Shell provides the context. The Upload Manager Feature is the smart container that connects the Upload Queue Store to the UI. It renders a list of File Item components which are dumb components receiving status and percentage.

Interaction Layer: Supports DragEvent for file drops and standard <input type="file">. Buttons trigger actions (pause/resume) which dispatch commands to the Concurrency Coordinator.

Rendering Layer: For long lists of uploads, we use List Virtualization (e.g., react-window) to ensure the DOM remains performant. Progress bars are optimized using transform: scaleX() to avoid layout reflows during frequent updates.

UI Frameworks: React for componentization, Tailwind CSS for styling, and Headless UI for accessible modals/dialogs.

Application Layer

Data Fetching Layer: While fetch is modern, we use XMLHttpRequest (XHR) for chunks because it provides a reliable upload.onprogress event and easier aborting mechanisms for MVP simplicity.

State Management Layer: A centralized Upload Queue Store (using Zustand or Redux) tracks every file's status (IDLE, HASHING, UPLOADING, PAUSED, COMPLETED, ERROR).

Concurrency Coordinator: A simple semaphore-based logic that monitors the queue and ensures only MAX_CONCURRENT_UPLOADS (e.g., 3) are in the active state.

Domain Layer

Business Rules: Validates file size and types before processing. Implements the logic that a file is only "Complete" when the server returns a 201/200 on the final merge request.

Integrity Hasher: A dedicated Web Worker reads the file in chunks using FileReaderSync to generate an MD5 hash. This hash serves as the upload_id for resumability.

Chunking Logic: Slices the File (Blob) into fixed sizes (e.g., 5MB). Each chunk is treated as an independent transactional unit.

Infrastructure Layer

API / Network: Standard RESTful endpoints: POST /uploads (init), PATCH /uploads/:id (upload chunk), and POST /uploads/:id/finish (merge).

Storage: Uses IndexedDB to store the mapping of File Hash -> Last Successful Chunk Index. This allows the user to refresh the page, re-select the same file, and resume exactly where they left off.

Wrap Up

Wrap-up

Trade-offs: Chunking adds complexity to the backend (merging files) and frontend (state management), but it is necessary for large file reliability. If we only supported <10MB files, a simple multipart/form-data upload would be more YAGNI-compliant.

Optimization: For very fast networks, we can implement Dynamic Chunk Sizing—increasing chunk size if the speed is high to reduce HTTP overhead.

Security: All pre-signed URLs or upload tokens must have short TTLs. Client-side hashing prevents the server from processing corrupted data.