The Question

Personalized Restaurant Recommendation and Ranking System

Design a high-scale recommendation and ranking system for a global food delivery platform. The system must handle over 100 million users and a million restaurants, delivering personalized results within a 200ms latency budget. Focus on the end-to-end ML lifecycle, specifically addressing geo-spatial constraints, real-time context (like weather and ETA), multi-stage ranking (retrieval vs. ranking), and the handling of the extreme feedback loop inherent in food ordering behavior.

Two-Tower

DeepFM

MMoE

FAISS

XGBoost

LightGBM

Kafka

Flink

Spark

Redis

Questions & Insights

Clarifying Questions

Clarifying Questions & Constraints:

Business Goal: Is the primary objective to maximize the number of orders (Conversion Rate), the total order value (GMV), or user retention? Assumption: Maximize Conversion Rate (Orders per Session) while ensuring a minimum delivery efficiency.

Constraints & Scale: What is the scale of the system? Assumption: 100M Monthly Active Users (MAU), 1M+ active restaurants, and a P99 latency budget of 200ms for the entire ranking stack.

Data Freshness: How quickly must the system respond to user actions (e.g., clicking a sushi place)? Assumption: Real-time session features are critical for capturing current "cravings."

Edge Cases: How do we handle "Cold Start" for new restaurants or users in new cities? Assumption: Use content-based features and geographic popularity as fallbacks.

Assumptions:

The corpus of restaurants reachable by a user is relatively small (~500-2000) based on delivery radius.

We have access to historical order logs, user location, and restaurant metadata.

Thinking Process

Identify the Funnel: Since we have 1M+ restaurants but only ~1,000 are relevant to a specific user's location, the system must first use Geo-filtering followed by a two-stage approach: Retrieval (narrowing down to ~100 candidates) and Ranking (precise ordering of those 100).

Context is King: Food preferences are highly temporal (Breakfast vs. Late Night) and situational (Weather, Day of week). Features must reflect this.

Optimization Objective: A simple CTR model isn't enough; a user might click but not order if the price or ETA is too high. I need to model the probability of an order.

Scalability: With high QPS, the ranking model must be efficient. I'll start with a LightGBM or a simple Deep Neural Network (DNN) before moving to more complex architectures.

Elite Bonus Points

Multi-Objective Optimization: Using MMoE (Multi-gate Mixture-of-Experts) to simultaneously optimize for Click-Through Rate (CTR) and Conversion Rate (CVR) while balancing business constraints like delivery partner availability.

Position Bias Correction: Implementing Shallow Towers or weight-based debiasing to ensure that restaurants are not ranked highly simply because they were at the top of the list historically.

ETA as a Feature vs. Constraint: Integrating real-time delivery estimates into the ranker. If the ETA is

>45

mins, the conversion probability drops non-linearly.

Calibration: Ensuring predicted probabilities match empirical delivery rates, which is crucial for downstream pricing or incentive logic.

Design Breakdown

Requirements

Product Goal: Provide personalized restaurant recommendations to users to minimize "search time" and maximize "order conversion."

Success Metrics:

Online Metrics: Conversion Rate (Orders / Sessions), Gross Merchandise Value (GMV), Time to Order.

Offline Metrics: NDCG (Normalized Discounted Cumulative Gain), AUC-ROC (for binary order prediction).

Guardrail Metrics: P99 Latency (<200ms), Restaurant churn/fairness (ensuring new merchants get impressions).

System Constraints: 100M users, peak 50k QPS, high availability (99.99%).

Data Availability: User order history, restaurant menu/category, real-time location, weather, and traffic data.

ML Problem Framing

ML Task Type: Ranking / Binary Classification.

Prediction Target:

P(\text{Order} | \text{User, Restaurant, Context})

Inputs:

User: Historical cuisines, price sensitivity, habitual ordering times.

Item (Restaurant): Average rating, cuisine type, price tier, historical popularity in the neighborhood.

Context: Current time, location, weather (e.g., more soup orders in rain), device type.

Outputs: A ranked list of restaurant IDs.

ML Challenges: Highly skewed labels (most sessions don't end in an order), extreme spatial constraints, and high sensitivity to ETA.

Design Summary & MVP

Concise Summary: A two-stage ranking system where a Geo-spatial filter and a fast Retrieval model (Two-Tower) generate candidates, followed by a high-precision Ranking model (XGBoost or DNN) that incorporates real-time context and ETA.

Model Architecture & Selection:

Baseline Model: Logistic Regression or a Heuristic (Popularity + Distance).

Target Model: Two-Tower Neural Network for retrieval; DeepFM or LightGBM for ranking.

Choice Rationale: Two-tower allows pre-computing user/item embeddings for fast retrieval. DeepFM/LightGBM captures complex non-linear interactions between "Cuisine Type" and "Time of Day" effectively.

Simplicity Audit: The MVP avoids Reinforcement Learning or Graph Neural Networks, focusing instead on robust feature engineering and a tried-and-tested two-stage funnel.

System Architecture

Pipeline Deep Dive

Data Pipeline

Data Source: Mobile app events (clicks, scrolls, cart adds), backend order transactions, and third-party weather/traffic APIs.

Data Ingestion: Kafka for real-time events; Airflow for batch ingestion of historical order data from the main Postgres/Spanner DBs.

Data Storage: Data Lake (S3) for raw logs. Delta Lake/Iceberg for ACID compliance on feature tables.

Data Processing: Spark for large-scale joins of user/restaurant history. Flink for low-latency sessionization (e.g., "user clicked 3 Italian places in the last 5 minutes").

Feature Pipeline

Feature Definition:

User: Embedding of last 20 orders, average spend, preferred delivery time.

Item: Restaurant ID, cuisine vector, rating, "Busy" status.

Context: H3 Geo-index, time-of-day (encoded as Sine/Cosine), rain/snow flags.

Offline Feature Pipeline: Weekly aggregation of restaurant popularity and user preferences stored in S3/BigQuery.

Online Feature Pipeline: Flink jobs calculating "User-Restaurant Distance" and "Current ETA" using real-time courier positions.

Feature Store: Use Redis for the online store (latency) and Hive for the offline store (volume).

Model Architecture

Retrieval (Candidate Generation):

Method: Two-Tower Architecture.

Logic: User Tower and Item Tower produce 128-d embeddings. At serving, perform an Approximate Nearest Neighbor (ANN) search using FAISS or ScaNN within the geographically filtered set of restaurants.

Ranking (Score & Order):

Choice: DeepFM (Deep Factorization Machine).

Rationale: It captures both low-order (e.g., Cuisine=Pizza AND Time=LateNight) and high-order feature interactions automatically.

Optimization: Use Quantization (FP16/INT8) for the ranking model to keep inference under 50ms.

Training Pipeline

Label Construction: Positive = Order Completed. Negative = Restaurant shown but not clicked, OR clicked but not ordered.

Data Splitting: Time-based split. Use the first 27 days for training and the last 3 days for validation to simulate real-world "future" prediction.

Imbalance Handling: Downsample the "Seen but not clicked" class to balance the dataset, applying importance sampling weights during training.

Retraining: Weekly full retrains + Daily incremental updates to capture shifting food trends (e.g., a new "viral" dish).

Serving Pipeline

Serving Pattern: Request-Response via a microservice.

Multi-Stage Ranking:

Filtering: Hard constraints (Is the restaurant open? Is the user in range?).

Retrieval: Two-tower ANN search (Top 200).

Ranking: DeepFM scoring (Top 50).

Re-Ranking: Business logic (Promotions, Diversification to ensure cuisine variety).

Reliability: Fallback to "Popularity in Neighborhood" if the ranking service times out.

Evaluation Pipeline

Offline: AUC for binary classification; NDCG@10 for ranking quality.

Online: A/B testing against a "control" model. Primary metric: Orders per Session. Secondary: Order Cancel Rate (if the model pushes high-ETA restaurants).

Monitoring Pipeline

Data Monitoring: Track "Feature Integrity." If the "User Location" feature starts coming in as null, trigger an alert immediately.

Model Monitoring: Monitor the Prediction Mean. If the average

P(\text{order})

drops significantly, the model may be stale.

Drift: Use Population Stability Index (PSI) to detect distribution shifts in user behavior (e.g., during holidays).

Wrap Up

Final Evaluation

Feedback Loop: Order outcomes (success/fail) are joined back to features in near real-time to update the model.

Trade-offs:

Accuracy vs. Latency: A massive Transformer would be more accurate but would exceed the 200ms budget. DeepFM provides the best balance.

Personalization vs. Exploration: Dedicate 5% of the list to "new" restaurants to solve the merchant cold-start problem.

Distinguishing Insights: In food delivery, the "Re-order" signal is incredibly strong. A "Frequently Ordered" feature often provides a 10%+ lift in AUC compared to any complex deep learning architecture.