DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
ML Design

Scalable Similar Listings Recommendation System

Design a high-scale 'Similar Listings' recommendation engine for a global vacation rental platform with 10M+ properties. The system must surface relevant, available alternatives in real-time (<150ms P99) while a user browses a Listing Detail Page. Focus on the end-to-end ML lifecycle: from multi-modal feature engineering (text, images, metadata) and handling unique inventory availability constraints, to a two-stage retrieval and ranking architecture. Discuss strategies for addressing listing cold-start, training/serving data consistency, and how to optimize for long-term business metrics like booking conversion rate.
LightGBM
Item2Vec
HNSW
FAISS
Spark
Kafka
Redis
Word2Vec
Vision Transformer
Questions & Insights

Clarifying Questions

Clarifying Questions & Constraints:
Business Goal: Is the primary North Star metric Bookings (conversion) or Listing Detail Page (LDP) views (engagement)? Answer: Conversion (Bookings).
Constraints & Scale: What is the scale of listings and traffic? Answer: 10M active listings, 100M monthly active users (MAU), 5k QPS at peak.
Latency Budget: What is the P99 latency requirement? Answer: 150ms for the entire recommendation component.
Freshness: How quickly must new listings appear in "Similar Listings"? Answer: Within 1 hour (Cold start is critical).
Availability: Should we only show listings available for the user's specific dates? Answer: Yes, availability is a hard constraint.
Assumptions:
A corpus of 10M listings.
P99 latency requirement of 150ms.
We have access to rich metadata (amenities, price, location) and high-quality images.

Thinking Process

Identify the Bottleneck: Similarity in vacation rentals is multi-faceted. It's not just "looks like this house," but "serves the same travel intent" (price point, location proximity, and group size).
Retrieval vs. Ranking: With 10M items, a single-stage model is impossible. I need a Two-Stage approach: 1) Fast Retrieval (Approximate Nearest Neighbors) and 2) Precise Ranking (Pointwise/Pairwise re-ranking).
The "Availability" Problem: Unlike E-commerce, inventory is unique and time-bound. A "similar" house that is booked for the user's dates is a dead end. Filtering must happen either during retrieval or immediately after.
Scaling the Solution: Leverage listing embeddings (Item2Vec or Content-based) for retrieval to handle the scale and use a GBDT or lightweight MLP for ranking to meet the 150ms budget.

Elite Bonus Points

Multi-modal Embeddings: Using a late-fusion approach to combine image embeddings (from a pre-trained Vision Transformer) with text embeddings (listing descriptions) to capture "vibe" similarity.
Availability-Aware Retrieval: Discussing the trade-off between "Post-filtering" (retrieving 100, filtering down to 10) vs "In-index filtering" (using HNSW with metadata filters) to prevent empty result sets.
Exploration/Exploitation (E&E): Implementing a small epsilon-greedy shuffle to prevent "rich-get-richer" effects and collect data on new, high-potential listings.
Session-Based Personalization: Adjusting "similarity" based on the user's current session (e.g., if they just looked at 3 beach houses, prioritize coastal similarity over price similarity).
Design Breakdown

Requirements

Product Goal: Surface 6-12 "Similar Listings" on the Listing Detail Page (LDP) to help users find alternatives and increase booking conversion.
Success Metrics:
Online Metrics: Booking Conversion Rate (CVR), CTR on recommendations, Average Daily Rate (ADR).
Offline Metrics: Recall@K (for retrieval), NDCG, LogLoss/AUC (for ranking).
Guardrail Metrics: P99 Latency, Listing Diversity (to avoid showing 10 identical units in the same building).
System Constraints: 10M items, 5k QPS, <150ms latency.
Data Availability: Listing metadata (price, rooms, location), User clickstream, historical booking logs, listing images.

ML Problem Framing

ML Task Type: Two-stage recommendation (Retrieval + Ranking).
Prediction Target: P(\text{Book} | \text{User}, \text{Context}, \text{Candidate Item}).
Inputs:
User: (Optional for MVP) Historical preferences, search filters.
Item (Anchor): Current listing's price, location (GeoHash), amenities, category (e.g., "Tiny Home").
Candidate Items: Features of potential similar listings.
Outputs: A ranked list of Listing IDs.
ML Challenges: Cold start for new listings, extreme data sparsity (most users don't book often), and the "Availability" hard constraint.

Design Summary & MVP

Concise Summary: A two-stage system using Approximate Nearest Neighbors (ANN) on listing embeddings for retrieval, followed by a LightGBM ranker that incorporates real-time availability and price delta features.
Model Architecture & Selection:
Baseline Model: Heuristic-based: "Top 10 listings in the same city within +/- 20% price range."
Target Model: Retrieval: Two-Tower model or Item2Vec embeddings stored in a Vector DB (Milvus/Pinecone). Ranking: LightGBM (Gradient Boosted Decision Trees) for fast, interpretable, and high-performance ranking.
Choice Rationale: GBDTs handle tabular features (price, room count) better than deep learning for ranking at this scale, while ANN enables sub-linear search over 10M listings.
ML Life Cycle Summary: Raw logs are processed via Spark; embeddings are generated offline; ANN index is updated hourly; LightGBM ranks the top 100 candidates online.
Simplicity Audit: Avoids complex Graph Neural Networks (GNNs) or real-time Transformers for the MVP, focusing on robust embeddings and efficient GBDT ranking.
Architecture Decision Rationale: This architecture balances the need for semantic similarity (embeddings) with the need for hard-constraint logic (price/location) and low-latency serving.

System Architecture

Pipeline Deep Dive

Data Pipeline

Data Source: Clickstream (listing views, "save to wishlist"), Transaction logs (bookings), Listing Metadata (updated via CDC from production DB).
Data Ingestion: Kafka for real-time events; Airflow for orchestrating batch ingestion from Listing DB.
Data Storage: S3 for the Data Lake (Parquet format for efficiency). Partitioning by date and region.
Data Processing: Spark for heavy-duty joins (User-Item interactions) and sessionization.
Data Quality: De-duplication of events, schema validation (Great Expectations), and checking for "orphan listings" (listings with no metadata).

Feature Pipeline

Feature Definition:
Item (Static): Price, No. of bedrooms, Latitude/Longitude, Amenities (WiFi, Pool, etc.).
Item (Dynamic): 7-day CTR, Booking rate, availability calendar.
Context: Date of stay, number of guests.
Feature Engineering:
Geohashing: Convert Lat/Long to Geohashes of varying precision for proximity matching.
Price Bucketing: Normalize price relative to the median of the city.
Online Feature Store: Redis-based (e.g., Tecton or Feast) to store real-time listing counters (e.g., "times viewed in last hour").
Training/Serving Skew: Use a single Feature Definition library for both Spark (offline) and the Online Service to ensure feature consistency.

Model Architecture

Problem Formulation: Pointwise ranking: Predict the probability of a booking for a candidate item given the anchor item.
Retrieval Architecture: Item2Vec. Treat a user's session of viewed listings as a "sentence" and listings as "words." Train Word2Vec to learn listing embeddings.
Why? Captures co-occurrence (people who look at X also look at Y).
Ranking Architecture: LightGBM.
Features: Embedding cosine similarity, Haversine distance between anchor and candidate, price difference, star rating difference.
Model Complexity: Item2Vec (128d embeddings) + LightGBM (500 trees). This is highly efficient for 150ms P99.

Training Pipeline

Dataset Construction:
Positive Labels: Bookings.
Negative Labels: Sampled from listings shown but not clicked, or random listings from the same city.
Data Splitting: Time-based split. Train on months 1-5, validate on month 6. Never use random split for time-series recommendation data.
Retraining Strategy: Daily batch retraining of the LightGBM model to capture latest trends. Hourly incremental updates to the Vector DB for new listings (using content-based embedding fallback for cold start).

Serving Pipeline

Serving Pattern:
Trigger: User visits Listing A.
Retrieval: Fetch 100 most similar listing IDs from Vector DB using Embedding(A).
Filter: Query Availability Service (Redis) to remove listings booked for the user's dates.
Rank: Batch predict P(\text{Book}) for the remaining ~50 listings using LightGBM.
Serve: Return top 10.
Latency Optimization: Use FAISS or HNSW for ANN. Multi-threaded ranking using OpenMP.

Evaluation Pipeline

Offline: Use historical sessions. Rank listings the user actually booked higher in the results. Metric: Recall@20 and NDCG@10.
Online: A/B Testing.
Control: Heuristic (Same city, same price).
Treatment: ML-based Retrieval + Ranking.
Metric: Booking Conversion Rate (Primary), Click-through Rate (Secondary).

Monitoring Pipeline

System Monitoring: Prometheus/Grafana for QPS, latency, and 5xx errors.
Model Monitoring: Track the distribution of the output scores (prediction drift). If the average P(\text{Book}) drops significantly, alert the team.
Feature Monitoring: Monitor for missing values in critical features like price or location.
Wrap Up

Final Evaluation

Observability: Use "Feature Importance" plots in LightGBM to ensure the model isn't over-relying on a single noisy feature.
Feedback Loop: Clicks/Bookings on the "Similar Listings" section are piped back into the training data daily.
Edge Cases:
New Listing: Use "Content-only" embeddings (average of image + text embeddings) until enough click data exists for Item2Vec.
Out-of-Stock: The Availability Filter is the most critical non-ML component.
Trade-offs: We trade off some accuracy (by not using a Deep Cross Network) for extreme low latency and maintainability (LightGBM).