The Question
ML DesignReal-Time Bidding (RTB) System Design
Design a high-scale Demand Side Platform (DSP) capable of processing millions of bid requests per second from various ad exchanges. The system must predict the probability of user engagement (clicks/conversions) to calculate optimal bid prices in under 50 milliseconds, while managing advertiser budgets and optimizing for long-term Return on Ad Spend (ROAS).
XGBoost/LightGBM
Logistic Regression
Redis Feature Store
Bayesian Updating
MAB
Questions & Insights
Clarifying Questions
Business Goal: Is the primary metric Revenue (for the platform), ROAS (for advertisers), or CTR/CVR? Assumption: Maximize ROAS for advertisers subject to budget constraints.
Constraints & Scale: What is the traffic volume? Assumption: 10M QPS at peak, 100M+ active creatives (items), and a strict <50ms P99 end-to-end latency budget (including network and bidding logic).
Edge Cases: How do we handle new advertisers (cold start)? How do we handle "Delayed Feedback" (conversions that happen 7 days after a click)?
Assumptions:
We are operating in a First-Price Auction environment (common in modern RTB).
We have a distributed budget pacing requirement.
User history is available via a low-latency Key-Value store.
Thinking Process
The Bottleneck: The 50ms latency budget is the ultimate constraint. I cannot run a 100-layer Transformer on 100M items. I need a multi-stage funnel: Retrieval (Filtering) → Ranking (CTR/CVR prediction) → Bidding (Calibration/Auction theory).
The Core ML Problem: It’s not just ranking; it’s calibration. In RTB, the predicted probability p must be an accurate reflection of reality (e.g., p=0.1 means 10 out of 100 trials) because the bid price is directly derived from p.
The Scaling Strategy: Use a Two-Tower architecture for retrieval to narrow down millions of ads to hundreds in <5ms, followed by a Deep Cross Network (DCN) or MMoE for precise scoring.
The Feedback Loop: RTB suffers from "Selection Bias"—we only see outcomes for auctions we won. I must account for this in the training data.
Elite Bonus Points
Bid Shading: In first-price auctions, bidding your true value is suboptimal. I would implement a "Bid Shading" model (e.g., using Distributional Reinforcement Learning) to predict the minimum price needed to win.
Delayed Feedback Modeling: Conversions often happen days later. I’ll use a "Negative Sampling with Correction" or "Importance Sampling" approach to update models before the final label is known.
Budget Pacing via PID Controllers: To avoid exhausting budgets in the first hour of the day, I’ll integrate a PID controller that throttles the bid multiplier based on the remaining budget/time ratio.
Feature Neutralization: Handling "Position Bias" by treating the ad slot position as a feature during training but using a "default" or "null" position during serving.
Design Breakdown
Functional Reqs
Ad Matching: Match incoming bid requests (user/context) with relevant advertiser creatives.
Bidding Logic: Calculate the optimal bid price in real-time.
Budget Management: Ensure advertisers do not exceed their daily/lifetime caps.
Reporting: Provide near real-time transparency into spend and performance.
Non-Functional Reqs
Ultra-Low Latency: <50ms P99 for the entire bid response.
High Availability: 99.99% uptime; if the system is down, we lose millions in revenue.
Consistency: Budget updates must be eventually consistent across regions within seconds.
Scalability: Horizontal scaling to handle massive spikes (e.g., Black Friday).
ML Problem Framing
ML Objective: Maximize E[Value] = P(click) \times P(conv|click) \times Value \times Bid\_Multiplier.
ML Category: Probabilistic Classification (pCTR, pCVR) and Regression (Bid Shading).
Input/Output/Label:
Input: User Profile, Device/Context, Creative Features, Historical Interaction.
Output: \hat{y} \in [0, 1] (Probability).
Label: Click (1/0) or Conversion (1/0).
Data Prep & Features
Data Pipeline:
Joiner Service: Joins Win Notices (the "Label") with Bid Requests (the "Features").
Attribution: 30-day window for conversions.
Feature Engineering:
User: Search history, demographic embeddings, frequency caps (how many times they saw Ad X today).
Context: URL/App bundle ID, IP-Geographic data, Time of Day, Ad Slot size.
Cross-Features: User-Category affinity, Creative-Bundle ID historical CTR (the "Cold Start" savior).
Feature Store: Use Redis for "Online" features (user real-time signals) and Bigtable/S3 for "Offline" features (historical aggregate CTR).
Model Architecture
Stage 1: Retrieval (Candidate Generation):
Two-Tower Neural Network: User tower and Creative tower. Compute dot-product similarity.
HNSW (Hierarchical Navigable Small World): For fast Approximate Nearest Neighbor (ANN) search.
Stage 2: Ranking (Precision Scoring):
Deep & Cross Network (DCN-v2): Captures explicit and implicit feature interactions efficiently.
Multi-gate Mixture-of-Experts (MMoE): If we want to optimize for both Click and Conversion simultaneously.
Stage 3: Calibration:
Isotonic Regression: Ensures the predicted probabilities align with actual empirical frequencies.
Training & Serving
Training: Incremental learning (online learning) using FTRL (Follow-The-Regularized-Leader) to adapt to shifting trends within minutes.
Serving: C++ or Go-based inference engine utilizing TensorRT or ONNX Runtime for hardware acceleration.
Addressing Challenges:
Exploration: Use Thompson Sampling or Upper Confidence Bound (UCB) to give new ads a chance to gather data.
System Architecture
Pipeline Deep Dive
Data Pipeline
Ingestion: We use Kafka to ingest high-volume bid requests and win notices. The challenge is the volume (terabytes/hour).
Joining: A Flink-based stateful stream processor joins the "bid" event with the "win/click" event. We keep the bid state in Flink's RocksDB state backend for up to 30 minutes.
Feature Pipeline
Consistency: To prevent training-serving skew, we log the features exactly as they were seen at inference time (Request Logging) instead of re-generating them from databases later.
Freshness: Real-time counters (e.g., "how many clicks for this creative in the last 5 minutes") are updated via Flink and pushed to Redis.
Training Pipeline
Warm-start: When launching a new model, we initialize weights from the previous version to reduce convergence time.
Time-based Splits: We never use random splits for AdTech. We train on days N to N+6 and validate on day N+7 to simulate the real-world temporal shift.
Serving Pipeline
Hierarchical Retrieval:
Hard Filters: Targeting (Geo, OS, Age).
ANN: Vector similarity to find 500 candidates.
Ranking: The ranking model scores the 500 candidates.
Bid Calculation: Bid = pCTR \times pCVR \times TargetCPA \times ShadingFactor.
Evaluation Pipeline
Online Experimentation: Use "Interleaved testing" for faster ranking comparisons.
Offline metric: We focus on AUC-ROC for ranking quality and Log-loss/Calibration error for bidding accuracy.
Monitoring Pipeline
Health: Monitoring "Bid Rate"—if it drops to zero, the ranker or pacing logic is likely broken.
Drift: We monitor the distribution of predicted CTRs. If the average pCTR shifts significantly without a change in actual CTR, we trigger a model rollback.
Wrap Up
Advanced Topics
North Star Metric: Advertiser ROAS (Return on Ad Spend) and DSP Take Rate.
Failure Modes:
Feedback Loop: We only get data on ads we bid on. Mitigation: Epsilon-greedy exploration (5% traffic).
Training-Serving Skew: Features like "User CTR" might be calculated differently in Python (Offline) vs. C++ (Online). Mitigation: Shared feature-transformation library.
Scalability Audit: To handle 10x traffic, we move ANN search to specialized hardware (FPGA/ASIC) and use a highly partitioned Redis cluster for feature lookups.