The Question
DesignScalable GIF Collection and Sharing System
Design a system that allows users to organize digital assets (GIFs) into personal collections. Users must be able to manage their collections (CRUD), add/remove assets, and share these collections with other users via unique links. Focus on data modeling for high-read shared content, access control for privacy, and handling viral traffic patterns.
PostgreSQL
Redis
REST
JWT
Docker
Kubernetes
CDN
Questions & Insights
Clarifying Questions
Scale and Usage: What is the expected scale in terms of Daily Active Users (DAU) and the average number of collections per user?
Content Hosting: Does the system need to host/upload GIF files, or are we storing metadata and external URLs (e.g., Giphy/Tenor links)?
Collaboration: Is "sharing" read-only (viewing a link) or collaborative (multiple users adding/removing GIFs from the same collection)?
Privacy: Are there different visibility levels (e.g., Public, Private, Shared-with-Link)?
Assumptions for MVP:
Scale: 10M DAU, ~100M total collections.
Content: The system stores GIF metadata (URLs, dimensions, tags) rather than hosting raw blobs.
Sharing: Read-only sharing via a unique ID/link. Only the owner can modify.
Privacy: Simple binary state: Private (owner only) or Shared (anyone with the link).
Thinking Process
Core Bottleneck: Efficiently managing the many-to-many relationship between Collections and GIFs while maintaining low latency for feed/viewing operations.
Data Modeling: Using a relational database for strong consistency on collection modifications to prevent "ghost" items or sync issues.
Sharing Logic: Implementing a robust Access Control List (ACL) or a simple visibility flag with unique tokens for non-public sharing.
Read Optimization: Utilizing a caching layer for "viral" or frequently accessed shared collections to reduce database load.
Bonus Points
Denormalized Counters: Using atomic increments in Redis or a separate counter table to track collection views/likes without locking the main collection table.
Bloom Filters: Using Bloom Filters at the Edge to quickly check if a GIF is already in a user's "Favorites" collection without hitting the DB.
Soft Deletes: Implementing a soft-delete mechanism for collections to allow for "Undo" functionality and better data recovery.
Global Secondary Indexes: If scaling to NoSQL (e.g., DynamoDB), discussing partition key strategies (UserID) vs. Global Secondary Indexes (CollectionID) to avoid hot partitions.
Design Breakdown
Functional Requirements
Users can create, update, and delete collections.
Users can add/remove GIFs to/from their collections.
Users can view their list of collections.
Users can share a collection link with others.
Shared links allow other users to view the collection content.
Non-Functional Requirements
Availability: High availability for viewing shared collections (99.9%).
Latency: Under 200ms for adding a GIF; under 100ms for viewing a collection.
Consistency: Strong consistency for collection edits (User must see their change immediately).
Scalability: Handle sudden spikes in traffic for viral collections.
Estimation
DAU: 10 Million.
Reads (Viewing): 50 per user/day = 500M Daily Requests (~6,000 QPS).
Writes (Adding/Creating): 5 per user/day = 50M Daily Requests (~600 QPS).
Storage: 100M collections 1KB/metadata = 100GB. 1B GIF-Collection mappings 100B = 100GB. Total DB size ~200-300GB (Easily fits in modern RDBMS).
Blueprint
Concise Summary: A microservices-based architecture centered around a Collection Service that manages GIF metadata and user-created groupings in a relational database, optimized with a cache for high-read shared content.
Major Components:
API Gateway: Handles authentication, rate limiting, and request routing to internal services.
Collection Service: Core business logic for CRUD operations on collections and GIF mappings.
User Service: Manages user profiles and authentication state.
Metadata DB (PostgreSQL): Persists collections, GIF links, and their relationships.
Cache (Redis): Stores frequently accessed collection data to minimize DB roundtrips.
Simplicity Audit: This design avoids complex message queues or stream processing because the write volume is manageable for a standard RDBMS, and the functional requirements do not yet demand asynchronous background tasks.
Architecture Decision Rationale:
Why this architecture is the best for this problem?: Relational DBs excel at managing the complex relationships and ownership constraints required for "collections."
Functional Requirement Satisfaction: Meets all CRUD and sharing needs through simple RESTful endpoints and visibility flags.
Non-functional Requirement Satisfaction: Caching ensures high availability and low latency for shared links, while the stateless service layer allows for horizontal scaling.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing:
Global DNS with latency-based routing.
CDN (e.g., CloudFront) for static assets (UI) and caching the external GIF thumbnails if necessary.
Security & Perimeter:
API Gateway: Validates JWT tokens from the User Service.
Rate Limiting: 100 write requests per minute per User ID to prevent spam/scraping.
SSL/TLS: Termination at the Gateway level.
Service
Topology & Scaling:
Stateless Services: Deployed as Docker containers on K8s across multiple Availability Zones.
Scaling: Horizontal scaling based on CPU utilization and request count.
API Schema Design:
POST /v1/collections: Create a new collection.GET /v1/collections/{id}: Retrieve collection details (Shared/Private check).POST /v1/collections/{id}/items: Add a GIF to a collection.DELETE /v1/collections/{id}/items/{gif_id}: Remove a GIF.PATCH /v1/collections/{id}: Update visibility (Public/Private).Resilience & Reliability:
Retries with exponential backoff for DB connections.
Circuit breaker for the Redis cache to fallback to DB if Redis is down.
Security:
Ownership Check: Every write request validates
user_id against the owner_id in the DB.Storage
Access Pattern: Read-heavy. Most queries are
Fetch all GIFs for Collection X or Fetch all Collections for User Y.Database Table Design:
Users:
id (PK), username, email, created_at.Collections:
id (PK), owner_id (FK), name, description, visibility (Enum), share_token, created_at.GIFs:
id (PK), external_url, width, height, provider (e.g., Giphy).Collection_Items:
collection_id (FK), gif_id (FK), added_at, sort_order.Technical Selection: PostgreSQL.
Rationale: Strong ACID guarantees for managing collection integrity. Handles relational joins efficiently. Support for JSONB if we need to store flexible GIF metadata.
Distribution Logic:
Initial MVP: Single primary with read replicas.
Future: Shard by
owner_id to ensure all of a user's collection data resides on the same shard.Cache
Purpose & Justification: Reduces read latency for "hot" collections and shared links that may go viral.
Key-Value Schema:
Key:
coll:{collection_id}.Value: JSON blob containing collection metadata and the list of GIF objects.
TTL: 10 minutes for public/shared collections; shorter or no cache for private ones to ensure consistency.
Technical Selection: Redis.
Rationale: High-performance in-memory storage with native support for list/set types if needed.
Failure Handling: If Redis fails, the system bypasses the cache and queries the PostgreSQL Read Replica.
Wrap Up
Advanced Topics
Trade-offs: We chose Strong Consistency over Eventual Consistency for the write path. While this might slightly increase write latency compared to a NoSQL approach, it prevents user frustration (e.g., adding a GIF and not seeing it appear immediately).
Reliability: Use of Multi-AZ database deployment ensures that if one zone goes down, the standby takes over with minimal RPO/RTO.
Bottleneck Analysis:
Hot Shards: If one user has a collection with millions of views, the DB could be pressured. The Redis cache specifically mitigates this.
Pagination: As collections grow,
GET /collections/{id}/items must support cursor-based pagination to avoid massive payloads.Security: Shareable links use a non-guessable
share_token (UUID) instead of an auto-incrementing integer ID to prevent enumeration attacks.