The Question
ML DesignReal-Time Bidding (RTB) System Design
Design a high-scale Real-Time Bidding (RTB) system for a Demand-Side Platform (DSP). The system must process over 1 million bid requests per second from global ad exchanges with a strict P99 latency of 50ms. Detail the architecture for predicting Click-Through Rate (CTR) and Conversion Rate (CVR), incorporating real-time budget pacing, handling delayed conversion feedback, and ensuring model calibration for accurate financial bidding. Explain your choices for data ingestion, low-latency feature retrieval, and the trade-offs between model complexity and inference speed in a production environment.
Factorization Machines
Field-aware Factorization Machines
Logistic Regression
Kafka
Flink
Redis
Spark
Platt Scaling
Isotonic Regression
PID Controller
Protobuf
Feast
Questions & Insights
Clarifying Questions
Business Goal: Is the primary objective to maximize Advertiser ROI (Conversions), maximize Spend (Budget Utilization), or maximize CTR?
Assumption: The goal is to maximize total conversions while adhering to advertiser-specified budgets and CPA (Cost Per Action) targets.
Constraints & Scale: What is the scale of the system in terms of QPS and latency?
Assumption: The system must handle 1M+ QPS with a strict P99 latency budget of 50ms for the entire bid response (meaning < 10ms for ML inference).
Item Corpus & Users: How many active ads (line items) and users are we dealing with?
Assumption: 500k active ad creatives and 500M monthly active users (MAU).
Data Freshness: How quickly do we need to react to a user’s latest click or an advertiser’s budget exhaustion?
Assumption: Budget updates must be near real-time (< 1s); user profile updates can be near real-time (< 30s).
Budget Pacing: Do we need to pace the budget evenly across the day?
Assumption: Yes, smooth pacing is required to avoid "exhausting budget in the first hour" (Gold Rush effect).
Thinking Process
Identify the Bottleneck: In RTB, the bottleneck is the Latency vs. Accuracy trade-off. We have ~10ms to score potentially thousands of candidate ads.
Multi-Stage Filtering: I cannot run a deep neural network on 500k ads. I need a multi-stage funnel:
Stage 1 (Hard Filtering): Geo, targeting, frequency capping (rules).
Stage 2 (Lightweight Scoring): Fast model (LR or FM) to narrow down to top N.
Stage 3 (Final Bidding): Calibrated P(click) and P(conv) calculation.
The "Bid" Formula: The bid isn't just a probability; it's a financial calculation: Bid = P(Click) \times P(Conv|Click) \times Value \times PacingFactor.
Infrastructure Choice: Given the scale, I need a Feature Store with low-latency lookups (Redis/Aerospike) and a streaming budget management system (Flink/Kafka).
Elite Bonus Points
Calibration for Bidding: ML models often output uncalibrated probabilities. In bidding, if a model says 0.1 probability but the true rate is 0.05, the advertiser will overpay. I would implement Platt Scaling or Isotonic Regression to ensure E[y] = E[\hat{y}].
Handling Delayed Feedback: Conversions often occur hours or days after a click. Training on recent data without correcting for this leads to a "pessimistic" model. I'd use Importance Sampling or FSI (Factual Soft-Interval) to weight recent samples.
Negative Downsampling with Calibration: Since CTR is < 1%, we downsample negatives to save compute. I would apply the inverse sampling correction to the model bias term to ensure the predicted probability remains in the original space.
Feedback Loop & Pid Controller: Use a PID controller for budget pacing to adjust the "bid multiplier" dynamically based on the current spend rate vs. target spend rate.
Design Breakdown
Requirements
Product Goal: Maximize conversions for advertisers within budget.
Success Metrics:
Online: eCPA (Effective Cost Per Action), Spend Rate, Win Rate, ROAS.
Offline: AUC-ROC (for CTR/CVR), Log Loss, Calibration Error (ECE).
Guardrail: P99 Latency < 50ms, Error Rate < 0.1%.
System Constraints: 1M QPS, 500M users, global distribution (multi-region), sub-10ms inference.
Data Availability: Real-time Bid Requests (OpenRTB protocol), Win Notices, Click logs, Conversion logs (postback).
ML Problem Framing
ML Task Type: Binary Classification for CTR (P(click)) and CVR (P(conv)).
Prediction Target: P(Click | User, Ad, Context) and P(Conversion | Click).
Inputs:
User: Historical categories clicked, frequency of views, device, OS.
Ad (Item): Category, Advertiser ID, Historical CTR, Creative size.
Context: Publisher ID (Site/App), Time of Day, IP-derived Location.
Outputs: A calibrated probability score [0, 1].
ML Challenges: High cardinality features (User ID, Publisher ID), extreme class imbalance, delayed labels, and non-stationary data (trends change hourly).
Design Summary & MVP
Concise Summary: A two-stage ranking system using a Feature Store for real-time lookups. First, a boolean filter narrows ads by targeting; second, a Logistic Regression or Factorization Machine predicts P(click) and P(conv) to calculate the final bid.
Model Architecture:
Baseline: Logistic Regression (LR) with Hashing Trick. Extremely fast, handles high-cardinality sparse IDs well.
Target Model: Field-aware Factorization Machines (FFM) or a small Wide & Deep model. FFMs capture feature interactions (e.g., User Category \times Publisher) without the latency of deep layers.
Simplicity Audit: I am avoiding Transformers or heavy Graph Neural Networks. In RTB, every millisecond of latency is lost revenue. LR/FFM on quantized features is the industry standard for MVP.
System Architecture
Pipeline Deep Dive
Data Pipeline
Data Source:
Bid Requests: Inbound JSON via HTTP from exchanges.
Feedback: Win notices (UDP or Pixel), Click/Conv redirects.
Data Ingestion: Kafka for high-throughput buffering. Use Protobuf for schema efficiency.
Data Storage: S3/Iceberg for the Data Lake (cost-effective storage). Parquet format for columnar reads.
Data Processing: Flink for real-time aggregation (e.g., "how many times has this user seen this ad in the last hour?").
Feature Pipeline
User Features: Pre-computed user embeddings (32-dim) stored in Redis. Real-time counters for frequency capping.
Item Features: Ad category, historical CTR (smoothed with Bayesian Prior), Advertiser ID.
Context Features:
site_id, hour_of_day, geo_hash.Feature Engineering:
Feature Hashing: To handle massive ID spaces (User IDs) without a huge dictionary.
Log-transforms: For counters to handle long-tail distributions.
Training/Serving Skew: Use a unified Feature Store (e.g., Feast) to ensure the same transformation logic is used in Spark (training) and the Go/Java scoring service (serving).
Model Architecture
Problem Formulation: Multi-task learning. Task 1: P(Click). Task 2: P(Conv|Click).
Model Choice: Factorization Machines (FM).
Why? It models the interaction between variables (User Category, Site) even if they haven't appeared together in training. It’s significantly faster than Deep Learning for inference (< 2ms).
Inference Optimization: Use Fixed-point arithmetic (Quantization) for model weights to fit in L3 cache.
Training Pipeline
Dataset Construction:
Positive Labels: Win + Click + Conv.
Negative Labels: Wins with no click.
Sampling: Ad impressions are 100:1 negative to positive. Downsample negatives by 10x and adjust the model intercept: w_{new} = w_{old} + \log(sampling\_rate).
Retraining: Daily batch training for the main weights; hourly incremental updates for the "bias" terms (to catch sudden spikes in traffic).
Serving Pipeline
Pattern: Request-Response with local caching.
Local Model Cache: The Bidding Engine loads the latest FFM weights every 5 minutes from the Registry into local memory to avoid network calls during inference.
Fallback: If the ML service times out (> 10ms), use a simple "Historical Average CTR" heuristic.
Evaluation Pipeline
Offline: AUC is the primary metric for ranking. For bidding, Calibration Plot and Brier Score are critical.
Online: A/B Testing using "Shadow Bidding" or "Bucketized Traffic". We measure the lift in CPA and total spend.
Monitoring Pipeline
System: Track
inference_latency_ms and feature_fetch_latency.Model: Monitor Prediction Drift. If the average P(click) drops by 20% suddenly, it usually indicates a data pipeline failure or a broken crawler on the exchange side.
Business: Win rate monitoring. A sudden drop in win rate usually means bids are too low (model is under-predicting).
Wrap Up
Final Evaluation
Feedback Loop: Win/Loss notifications are fed back to the pacing controller (PID) to adjust the
bid_multiplier.Edge Cases:
Cold Start: Use Thompson Sampling. Allocate 5% of traffic to "Explore" new ads by boosting their predicted CTR.
Budget Exhaustion: Use a high-priority Bloom Filter to quickly reject ads whose budgets are empty.
Trade-offs: Complexity (Deep Learning) vs. Latency (FFM). At 1M QPS, the infra cost of Deep Learning outweighs the 1% AUC gain. Stick to FFM.