The Question
ML Design

Ad Click-Through Rate Prediction System

Design a high-scale advertising ranking system that predicts the probability of a user clicking on a specific ad creative. The system must handle high-cardinality data, ensure low-latency inference for real-time auctions, and maintain model freshness through a robust data and feature pipeline.
XGBoost/LightGBM
DCN
DeepFM
Logistic Regression
Embeddings
Questions & Insights

Clarifying Questions

Business Goal: Is the objective strictly to maximize Click-Through Rate (CTR) or to maximize Revenue (e.g., eCPM = CTR Bid)? Assumption: We aim to maximize CTR to improve user experience and platform relevance, which indirectly drives long-term revenue.*
Constraints & Scale:
DAU: 500M+ active users.
Ad Corpus: 10M+ active ads.
Traffic: 100k+ Queries Per Second (QPS) at peak.
Latency: End-to-end (Retrieval + Ranking + Re-ranking) < 100ms P99.
Edge Cases:
Cold Start: How do we handle new ads with no history? (Exploration vs. Exploitation).
Data Freshness: How quickly must a click be reflected in the model? (Within minutes for trend-based ads).
Delayed Feedback: Clicks happen seconds after impression, but "conversions" (if relevant) take days.
Assumptions: I assume we have access to historical logs of (User, Ad, Context, Outcome). I assume a two-stage architecture: Candidate Retrieval followed by a Deep Ranking model.

Thinking Process

The Bottleneck: In AdTech, the sheer volume of features (billions of parameters due to high-cardinality IDs) and the need for sub-100ms latency are the primary constraints.
Retrieval vs. Ranking: We cannot rank 10M ads in real-time. I need a multi-stage funnel: Retrieval (Filtering/ANN) -> Pre-Ranking (Lightweight) -> Deep Ranking (Heavyweight) -> Auction/Re-ranking.
Scaling the Model: To capture complex feature interactions (e.g., User Interest X Ad Category), I'll lean towards Deep & Cross Networks (DCN) or DeepFM to handle both low-order and high-order interactions efficiently.
The Data Loop: Feature freshness is king. I must implement a Lambda or Kappa architecture to ensure real-time counters (e.g., "how many times has this user clicked this category in the last 10 minutes") are available during inference.

Elite Bonus Points

Calibration for Bidding: CTR models often over/under-predict. In a second-price auction, we need the predicted probability (pCTR) to be well-calibrated (e.g., using Platt Scaling or Isotonic Regression) so that the pCTR \times Bid calculation is economically sound.
Handling Delayed Feedback: Use a "positive-unlabeled" learning approach or "Importance Sampling" to correct the bias created when training on data where clicks haven't happened yet but might happen later.
Counterfactual Evaluation: Since we only see clicks for ads we actually showed, use Inverse Propensity Scoring (IPS) to evaluate how a new model would have performed on the ads it would have picked.
Embeddings Versioning: Implement a "Warm-Start" strategy for embeddings when deploying new model versions to prevent a "performance dip" while the new embedding layer converges.
Design Breakdown

Functional Reqs

Ad Selection: For a given user and context (search query or page content), return a list of ads ranked by the highest probability of click.
Real-time Scoring: The system must provide a score in real-time for the auction mechanism.
Ad Filtering: Exclude ads that the user has already seen too many times (Frequency Capping).

Non-Functional Reqs

Scalability: Support horizontal scaling to handle 100k QPS.
Availability: 99.99% uptime; Ad delivery is critical for revenue.
Latency: Scoring of top candidates must be < 20ms within the total 100ms budget.
Freshness: Model should be retrained/updated daily, with streaming feature updates every minute.

ML Problem Framing

ML Objective: Minimize the Logarithmic Loss (LogLoss) between the predicted pCTR and the actual binary outcome (Click=1, No-Click=0).
ML Category: Pointwise Binary Classification.
Input/Output/Label:
Input: User Features, Ad Features, Context Features (Device, Time, Geo).
Output: A probability scalar [0, 1].
Label: 1 for a click, 0 for an impression without a click (after a specific attribution window).

Data Prep & Features

Data Pipeline: Join Impression logs with Click logs using a request_id.
Feature Engineering:
Sparse Features: User ID, Ad ID, Creative ID, Publisher ID (use Hashing Trick to manage dimensionality).
Dense/Continuous: User age, historical CTR of the ad (Global CTR), time since last click.
Sequential Features: Last 10 ads the user clicked (processed via GRU or Attention).
Cross-Features: (User_Gender x Ad_Category), (User_Location x Ad_Language).
Feature Store: Use an online store (e.g., Redis) for serving real-time features and an offline store (e.g., Hive/S3) for training.

Model Architecture

Model Choice: Deep & Cross Network (DCN v2).
Cross Network: Explicitly learns bounded-degree feature interactions (e.g., x_1 \times x_2).
Deep Network (MLP): Learns highly non-linear interactions.
Loss Function: Binary Cross-Entropy (LogLoss). LogLoss is preferred over AUC for CTR because we need accurate probability estimates for the auction, not just a relative ranking.
Multi-Task Learning (Optional): If we also care about conversions, use MMoE (Multi-gate Mixture-of-Experts) to predict CTR and CVR (Conversion Rate) simultaneously.

Training & Serving

Optimization: Adam optimizer with weight decay. Use a decaying learning rate.
Negative Downsampling: Since clicks are rare (e.g., 1%), downsample the majority class (no-clicks) to speed up training, then apply a calibration correction: p' = \frac{p}{p + (1-p)/w} where w is the downsampling rate.
Online Serving: Export model to ONNX or TensorRT. Deploy on Triton Inference Server with model sharding (distributing the massive embedding table across multiple GPUs).

System Architecture

Pipeline Deep Dive

Data Pipeline

Ingestion: Client-side SDKs send impression and click events to Kafka.
Joiner: A Flink job performs a temporal join. Impressions are buffered in state for 30 minutes. If a click arrives with the same request_id, it's emitted as a positive; otherwise, the impression expires and is emitted as a negative.
Storage: Data is partitioned by date and hour in S3 in Parquet format for efficient columnar access.

Feature Pipeline

Extraction: We calculate historical CTR (Clicks/Impressions) over multiple windows (1h, 1d, 7d).
Z-Score Normalization: Continuous features are normalized. Sparse features are mapped to integer IDs for embedding lookups.
Consistency: The same transformation code is used in Spark (offline) and the inference service (online) to prevent training-serving skew.

Training Pipeline

Strategy: Use Horovod or Parameter Servers for distributed training across multiple nodes to handle the 100GB+ embedding table.
Incremental Training: Instead of training from scratch daily, use "Continuous Learning" to update weights with the latest data every hour, maintaining high model freshness.

Serving Pipeline

Retrieval: Use a Two-Tower Neural Network to generate embeddings for Users and Ads. Use HNSW (Hierarchical Navigable Small World) in a vector DB (like Milvus) to find the top 500 candidates.
Ranking: The DCN model scores these 500 candidates.
Calibration: Apply Isotonic Regression on the output of the Ranker to ensure the probability reflects the true empirical click rate.

Evaluation Pipeline

A/B Testing: Hash the user_id into buckets (Control vs. Treatment).
Interleaving: For faster comparison, mix the results of Model A and Model B in a single list and track which model's items get more clicks.

Monitoring Pipeline

Label Leakage: Monitor if CTR suddenly jumps to 100% (often caused by including future click info in training features).
Feature Drift: Calculate the Population Stability Index (PSI) for input features to detect if the distribution of incoming traffic has changed (e.g., a new app version changing feature formats).
Wrap Up

Advanced Topics

Offline: LogLoss (primary), AUC (ranking quality), and Calibration Error.
Online: CTR (North Star) and eCPM (Revenue).
Scalability: The system scales linearly with the number of ranking shards. Using a Feature Store ensures that the Ranking service remains stateless and highly available.