The Question
ML DesignPersonalized Restaurant Recommendation and Ranking System
Design a high-scale recommendation and ranking system for a global food delivery platform. The system must handle over 100 million users and a million restaurants, delivering personalized results within a 200ms latency budget. Focus on the end-to-end ML lifecycle, specifically addressing geo-spatial constraints, real-time context (like weather and ETA), multi-stage ranking (retrieval vs. ranking), and the handling of the extreme feedback loop inherent in food ordering behavior.
Two-Tower
DeepFM
MMoE
FAISS
H3
XGBoost
LightGBM
Kafka
Flink
Spark
Redis
Questions & Insights
Clarifying Questions
Clarifying Questions & Constraints:
Business Goal: Is the primary objective to maximize the number of orders (Conversion Rate), the total order value (GMV), or user retention? Assumption: Maximize Conversion Rate (Orders per Session) while ensuring a minimum delivery efficiency.
Constraints & Scale: What is the scale of the system? Assumption: 100M Monthly Active Users (MAU), 1M+ active restaurants, and a P99 latency budget of 200ms for the entire ranking stack.
Data Freshness: How quickly must the system respond to user actions (e.g., clicking a sushi place)? Assumption: Real-time session features are critical for capturing current "cravings."
Edge Cases: How do we handle "Cold Start" for new restaurants or users in new cities? Assumption: Use content-based features and geographic popularity as fallbacks.
Assumptions:
The corpus of restaurants reachable by a user is relatively small (~500-2000) based on delivery radius.
We have access to historical order logs, user location, and restaurant metadata.
Thinking Process
Identify the Funnel: Since we have 1M+ restaurants but only ~1,000 are relevant to a specific user's location, the system must first use Geo-filtering followed by a two-stage approach: Retrieval (narrowing down to ~100 candidates) and Ranking (precise ordering of those 100).
Context is King: Food preferences are highly temporal (Breakfast vs. Late Night) and situational (Weather, Day of week). Features must reflect this.
Optimization Objective: A simple CTR model isn't enough; a user might click but not order if the price or ETA is too high. I need to model the probability of an order.
Scalability: With high QPS, the ranking model must be efficient. I'll start with a LightGBM or a simple Deep Neural Network (DNN) before moving to more complex architectures.
Elite Bonus Points
Multi-Objective Optimization: Using MMoE (Multi-gate Mixture-of-Experts) to simultaneously optimize for Click-Through Rate (CTR) and Conversion Rate (CVR) while balancing business constraints like delivery partner availability.
Position Bias Correction: Implementing Shallow Towers or weight-based debiasing to ensure that restaurants are not ranked highly simply because they were at the top of the list historically.
ETA as a Feature vs. Constraint: Integrating real-time delivery estimates into the ranker. If the ETA is >45 mins, the conversion probability drops non-linearly.
Calibration: Ensuring predicted probabilities match empirical delivery rates, which is crucial for downstream pricing or incentive logic.
Design Breakdown
Requirements
Product Goal: Provide personalized restaurant recommendations to users to minimize "search time" and maximize "order conversion."
Success Metrics:
Online Metrics: Conversion Rate (Orders / Sessions), Gross Merchandise Value (GMV), Time to Order.
Offline Metrics: NDCG (Normalized Discounted Cumulative Gain), AUC-ROC (for binary order prediction).
Guardrail Metrics: P99 Latency (<200ms), Restaurant churn/fairness (ensuring new merchants get impressions).
System Constraints: 100M users, peak 50k QPS, high availability (99.99%).
Data Availability: User order history, restaurant menu/category, real-time location, weather, and traffic data.
ML Problem Framing
ML Task Type: Ranking / Binary Classification.
Prediction Target: P(\text{Order} | \text{User, Restaurant, Context}).
Inputs:
User: Historical cuisines, price sensitivity, habitual ordering times.
Item (Restaurant): Average rating, cuisine type, price tier, historical popularity in the neighborhood.
Context: Current time, location, weather (e.g., more soup orders in rain), device type.
Outputs: A ranked list of restaurant IDs.
ML Challenges: Highly skewed labels (most sessions don't end in an order), extreme spatial constraints, and high sensitivity to ETA.
Design Summary & MVP
Concise Summary: A two-stage ranking system where a Geo-spatial filter and a fast Retrieval model (Two-Tower) generate candidates, followed by a high-precision Ranking model (XGBoost or DNN) that incorporates real-time context and ETA.
Model Architecture & Selection:
Baseline Model: Logistic Regression or a Heuristic (Popularity + Distance).
Target Model: Two-Tower Neural Network for retrieval; DeepFM or LightGBM for ranking.
Choice Rationale: Two-tower allows pre-computing user/item embeddings for fast retrieval. DeepFM/LightGBM captures complex non-linear interactions between "Cuisine Type" and "Time of Day" effectively.
Simplicity Audit: The MVP avoids Reinforcement Learning or Graph Neural Networks, focusing instead on robust feature engineering and a tried-and-tested two-stage funnel.
System Architecture
Pipeline Deep Dive
Data Pipeline
Data Source: Mobile app events (clicks, scrolls, cart adds), backend order transactions, and third-party weather/traffic APIs.
Data Ingestion: Kafka for real-time events; Airflow for batch ingestion of historical order data from the main Postgres/Spanner DBs.
Data Storage: Data Lake (S3) for raw logs. Delta Lake/Iceberg for ACID compliance on feature tables.
Data Processing: Spark for large-scale joins of user/restaurant history. Flink for low-latency sessionization (e.g., "user clicked 3 Italian places in the last 5 minutes").
Feature Pipeline
Feature Definition:
User: Embedding of last 20 orders, average spend, preferred delivery time.
Item: Restaurant ID, cuisine vector, rating, "Busy" status.
Context: H3 Geo-index, time-of-day (encoded as Sine/Cosine), rain/snow flags.
Offline Feature Pipeline: Weekly aggregation of restaurant popularity and user preferences stored in S3/BigQuery.
Online Feature Pipeline: Flink jobs calculating "User-Restaurant Distance" and "Current ETA" using real-time courier positions.
Feature Store: Use Redis for the online store (latency) and Hive for the offline store (volume).
Model Architecture
Retrieval (Candidate Generation):
Method: Two-Tower Architecture.
Logic: User Tower and Item Tower produce 128-d embeddings. At serving, perform an Approximate Nearest Neighbor (ANN) search using FAISS or ScaNN within the geographically filtered set of restaurants.
Ranking (Score & Order):
Choice: DeepFM (Deep Factorization Machine).
Rationale: It captures both low-order (e.g., Cuisine=Pizza AND Time=LateNight) and high-order feature interactions automatically.
Optimization: Use Quantization (FP16/INT8) for the ranking model to keep inference under 50ms.
Training Pipeline
Label Construction: Positive = Order Completed. Negative = Restaurant shown but not clicked, OR clicked but not ordered.
Data Splitting: Time-based split. Use the first 27 days for training and the last 3 days for validation to simulate real-world "future" prediction.
Imbalance Handling: Downsample the "Seen but not clicked" class to balance the dataset, applying importance sampling weights during training.
Retraining: Weekly full retrains + Daily incremental updates to capture shifting food trends (e.g., a new "viral" dish).
Serving Pipeline
Serving Pattern: Request-Response via a microservice.
Multi-Stage Ranking:
Filtering: Hard constraints (Is the restaurant open? Is the user in range?).
Retrieval: Two-tower ANN search (Top 200).
Ranking: DeepFM scoring (Top 50).
Re-Ranking: Business logic (Promotions, Diversification to ensure cuisine variety).
Reliability: Fallback to "Popularity in Neighborhood" if the ranking service times out.
Evaluation Pipeline
Offline: AUC for binary classification; NDCG@10 for ranking quality.
Online: A/B testing against a "control" model. Primary metric: Orders per Session. Secondary: Order Cancel Rate (if the model pushes high-ETA restaurants).
Monitoring Pipeline
Data Monitoring: Track "Feature Integrity." If the "User Location" feature starts coming in as null, trigger an alert immediately.
Model Monitoring: Monitor the Prediction Mean. If the average P(\text{order}) drops significantly, the model may be stale.
Drift: Use Population Stability Index (PSI) to detect distribution shifts in user behavior (e.g., during holidays).
Wrap Up
Final Evaluation
Feedback Loop: Order outcomes (success/fail) are joined back to features in near real-time to update the model.
Trade-offs:
Accuracy vs. Latency: A massive Transformer would be more accurate but would exceed the 200ms budget. DeepFM provides the best balance.
Personalization vs. Exploration: Dedicate 5% of the list to "new" restaurants to solve the merchant cold-start problem.
Distinguishing Insights: In food delivery, the "Re-order" signal is incredibly strong. A "Frequently Ordered" feature often provides a 10%+ lift in AUC compared to any complex deep learning architecture.