The Question
ML DesignLarge-Scale Visual Discovery & Feed Recommendation System
Design a high-scale recommendation system for a visual discovery platform where users engage with items through saves and closeups. The system must handle a corpus of billions of items, prioritize graph-based relationships between items and collections, and maintain sub-150ms latency for a personalized home feed.
PinSage
DCN-v2
ANN
FAISS
MMoE
Kafka
Flink
Spark
Tecton
ViT
Questions & Insights
Clarifying Questions
Business Goal: Is the primary North Star metric engagement (Saves/Repins) or consumption (Closeups/Click-throughs)?
Assumption: The goal is to maximize "Pinner Value," weighted towards Saves (high intent) and Closeups (engagement).
Constraints & Scale: What is the scale of the user base and the item corpus?
Assumption: 500M+ Monthly Active Users (MAU), 10B+ Pins, and a P99 latency budget of 150ms for the home feed.
Data Freshness: How quickly should user actions (e.g., just saved a "Modern Home" pin) reflect in the feed?
Assumption: Near real-time (seconds) for intra-session updates to maintain relevance.
Cold Start: How do we handle new Pins (fresh content) vs. new users?
Assumption: We need a mechanism to prioritize exploration for new Pins to gather initial engagement statistics.
Thinking Process
The Funnel Approach: With 10B items, we cannot rank everything. I must design a multi-stage funnel: Retrieval (Candidate Generation) \rightarrow Ranking (Scoring) \rightarrow Re-ranking (Business logic/Diversity).
Visual-First Strategy: Pinterest is visual. Embeddings are the first-class citizens. I need to leverage PinSage (Graph Convolutional Networks) or similar visual-graph embeddings to capture the relationship between Pins and Boards.
Exploitation vs. Exploration: A pure engagement model will lead to filter bubbles. I need to incorporate a "freshness" or "interest expansion" component in the retrieval layer.
Latency Bottleneck: Feature retrieval and the final ranking model are the main latency sinks. I’ll use a Feature Store for low-latency lookups and a two-tower model for the first stage.
Elite Bonus Points
Pin-Board Graph Embeddings: Utilizing the bipartite graph of Pins and Boards. If two Pins frequently appear on the same Board, they are semantically related even if visually different.
Calibration of Multi-objective Scores: Combining P(\text{click}) and P(\text{save}) requires calibration (e.g., Platt scaling) so the weighted sum w_1 \cdot P(\text{click}) + w_2 \cdot P(\text{save}) is mathematically sound.
Handling Positional Bias: Using a position feature during training (but nullifying it during inference) to ensure the model doesn't just learn that "items at the top get clicked more."
Embedding Versioning: Implementing a "warm-start" strategy for embeddings to prevent catastrophic forgetting when the embedding model is updated.
Design Breakdown
Requirements
Product Goal: Deliver a personalized, inspiring home feed that maximizes long-term user retention through Saves and Closeups.
Success Metrics:
Online: Save Rate (Primary), Closeup Rate (Secondary), Time Spent, DAU/MAU.
Offline: NDCG@Top-K, Hit Rate, AUC-ROC for binary actions.
Guardrail: P99 Latency < 150ms, Diversity score (inter-item similarity), Ad-load balance.
System Constraints:
High QPS (100k+ RPS).
Billions of items in the searchable index.
Real-time updates for user interest profiles.
ML Problem Framing
ML Task Type: Multi-stage Ranking (Retrieval + Heavy Ranking).
Prediction Target: Multi-task learning to predict \text{logit}_{save} and \text{logit}_{closeup}.
Inputs:
User: Historical saves (last 100), search history, interests (taxonomy), demographics.
Item (Pin): PinSage embeddings, visual features (ViT/CNN), text (title/description), engagement stats (CTR, Save-rate).
Context: Device, Time of day, Country, Session-depth.
ML Challenges: Extreme class imbalance (Saves are rare), Feedback loops (users only see what we recommend), and Data sparsity for new Pins.
Design Summary & MVP
Concise Summary: A two-stage recommender system utilizing Approximate Nearest Neighbor (ANN) search for retrieval and a Deep Neural Network (DNN) for ranking.
Model Architecture & Selection:
Baseline: Collaborative Filtering (Matrix Factorization) + Logistic Regression.
Target Model: Retrieval: Two-Tower (User/Pin) Model with PinSage embeddings. Ranking: Deep & Cross Network (DCN-V2) for explicit feature interaction.
Simplicity Audit: We avoid Reinforcement Learning (RL) for the MVP. A supervised Multi-Task Learning (MTL) approach with a fixed weighted sum for the final score is the robust industry standard.
Architecture Decision Rationale: Two-tower allows pre-computing Pin embeddings for sub-millisecond retrieval. DCN-V2 captures non-linear interactions (e.g., User_Interests x Pin_Category) efficiently.
System Architecture
Pipeline Deep Dive
Data Pipeline
Data Source: Application logs (clicks, saves, long-press), Pin metadata (image/text), Board structure.
Data Ingestion: Kafka for real-time engagement streams. Airflow for scheduled batch ingestion of metadata.
Data Storage: S3 (Data Lake) for raw logs. BigQuery/Snowflake for structured warehouse queries.
Data Quality: De-duplication of event IDs, Schema enforcement using Protobuf, and automated null-check alerts on the ingestion DAGs.
Feature Pipeline
User Features: Long-term interests (30-day aggregations), Short-term session features (last 5 pins clicked).
Pin Features: PinSage embeddings (captured from Pin-Board graph), visual embeddings (from a pre-trained ViT), and historical engagement (CTR, Save-rate per category).
Online/Offline Consistency: Using a Feature Store (e.g., Tecton) to ensure the same transformation logic is used during training (offline) and inference (online), preventing "Training-Serving Skew."
Temporal Correctness: Point-in-time joins in the offline pipeline to ensure the model doesn't see "future" saves during training.
Model Architecture
Retrieval (Candidate Generation):
Multi-channel:
PinSage ANN: Find Pins similar to User's recent saves in embedding space.
User-Item CF: Pins saved by users with similar profiles.
Category/Trending: To solve the cold-start and ensure freshness.
Ranking (DCN-V2):
Input: Concatenated embeddings of User, Pin, Context + Cross-features.
Cross-Layers: Explicitly model u \times i interactions (e.g., User_{Style} \times Pin_{Color}).
Multi-gate Mixture of Experts (MMoE): To handle multiple objectives (Save vs. Click) by sharing base layers but having task-specific towers.
Training Pipeline
Labeling: Positive label = Save/Closeup. Negative label = Impression without action (randomly sampled from the same session to avoid easy negatives).
Class Imbalance: Downsampling impressions or using Focal Loss to focus on rare "Save" events.
Infrastructure: Distributed PyTorch on GPU clusters. Using a time-based split (e.g., Train on Monday-Saturday, Test on Sunday) to evaluate real-world performance.
Serving Pipeline
Pattern: Two-stage serving.
Retrieval Service: Queries a Vector Database (Milvus/Faiss) using the User Embedding.
Scoring Service: Ranks top 500-1000 candidates using the DCN model.
Latency Optimization: Model quantization (FP16/INT8), caching top-k results for popular user segments, and parallelizing feature lookups.
Evaluation Pipeline
Offline: AUC for binary classification; NDCG for ranking quality.
Online: A/B test comparing the new model against the production baseline. Monitor "Save-Rate" and "Time-to-first-save."
Monitoring Pipeline
Model Drift: Monitor the distribution of predicted scores (PSI - Population Stability Index).
Feature Drift: Monitor the mean/std of input features in real-time.
Delayed Feedback: Special handling for "Saves" which may occur minutes after the impression.
Wrap Up
Final Evaluation
Cold Start: Handled by a "Freshness Booster" heuristic in the retrieval stage and visual-similarity retrieval (using embeddings of new pins).
Exploration vs. Revenue: Multi-objective optimization allows balancing engagement with potential ad-revenue Pins.
Trade-offs: We choose a complex DNN over a Linear model to capture visual nuances, accepting a higher latency cost which we mitigate via ANN retrieval.