DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
ML Design

Real-Time Bidding (RTB) System Design

Design a high-scale Real-Time Bidding (RTB) system for a Demand-Side Platform (DSP). The system must process over 1 million bid requests per second from global ad exchanges with a strict P99 latency of 50ms. Detail the architecture for predicting Click-Through Rate (CTR) and Conversion Rate (CVR), incorporating real-time budget pacing, handling delayed conversion feedback, and ensuring model calibration for accurate financial bidding. Explain your choices for data ingestion, low-latency feature retrieval, and the trade-offs between model complexity and inference speed in a production environment.
Factorization Machines
Field-aware Factorization Machines
Logistic Regression
Kafka
Flink
Redis
Spark
Platt Scaling
Isotonic Regression
PID Controller
Protobuf
Feast
Questions & Insights

Clarifying Questions

Business Goal: Is the primary objective to maximize Advertiser ROI (Conversions), maximize Spend (Budget Utilization), or maximize CTR?
Assumption: The goal is to maximize total conversions while adhering to advertiser-specified budgets and CPA (Cost Per Action) targets.
Constraints & Scale: What is the scale of the system in terms of QPS and latency?
Assumption: The system must handle 1M+ QPS with a strict P99 latency budget of 50ms for the entire bid response (meaning < 10ms for ML inference).
Item Corpus & Users: How many active ads (line items) and users are we dealing with?
Assumption: 500k active ad creatives and 500M monthly active users (MAU).
Data Freshness: How quickly do we need to react to a user’s latest click or an advertiser’s budget exhaustion?
Assumption: Budget updates must be near real-time (< 1s); user profile updates can be near real-time (< 30s).
Budget Pacing: Do we need to pace the budget evenly across the day?
Assumption: Yes, smooth pacing is required to avoid "exhausting budget in the first hour" (Gold Rush effect).

Thinking Process

Identify the Bottleneck: In RTB, the bottleneck is the Latency vs. Accuracy trade-off. We have ~10ms to score potentially thousands of candidate ads.
Multi-Stage Filtering: I cannot run a deep neural network on 500k ads. I need a multi-stage funnel:
Stage 1 (Hard Filtering): Geo, targeting, frequency capping (rules).
Stage 2 (Lightweight Scoring): Fast model (LR or FM) to narrow down to top N.
Stage 3 (Final Bidding): Calibrated P(click) and P(conv) calculation.
The "Bid" Formula: The bid isn't just a probability; it's a financial calculation: Bid = P(Click) \times P(Conv|Click) \times Value \times PacingFactor.
Infrastructure Choice: Given the scale, I need a Feature Store with low-latency lookups (Redis/Aerospike) and a streaming budget management system (Flink/Kafka).

Elite Bonus Points

Calibration for Bidding: ML models often output uncalibrated probabilities. In bidding, if a model says 0.1 probability but the true rate is 0.05, the advertiser will overpay. I would implement Platt Scaling or Isotonic Regression to ensure E[y] = E[\hat{y}].
Handling Delayed Feedback: Conversions often occur hours or days after a click. Training on recent data without correcting for this leads to a "pessimistic" model. I'd use Importance Sampling or FSI (Factual Soft-Interval) to weight recent samples.
Negative Downsampling with Calibration: Since CTR is < 1%, we downsample negatives to save compute. I would apply the inverse sampling correction to the model bias term to ensure the predicted probability remains in the original space.
Feedback Loop & Pid Controller: Use a PID controller for budget pacing to adjust the "bid multiplier" dynamically based on the current spend rate vs. target spend rate.
Design Breakdown

Requirements

Product Goal: Maximize conversions for advertisers within budget.
Success Metrics:
Online: eCPA (Effective Cost Per Action), Spend Rate, Win Rate, ROAS.
Offline: AUC-ROC (for CTR/CVR), Log Loss, Calibration Error (ECE).
Guardrail: P99 Latency < 50ms, Error Rate < 0.1%.
System Constraints: 1M QPS, 500M users, global distribution (multi-region), sub-10ms inference.
Data Availability: Real-time Bid Requests (OpenRTB protocol), Win Notices, Click logs, Conversion logs (postback).

ML Problem Framing

ML Task Type: Binary Classification for CTR (P(click)) and CVR (P(conv)).
Prediction Target: P(Click | User, Ad, Context) and P(Conversion | Click).
Inputs:
User: Historical categories clicked, frequency of views, device, OS.
Ad (Item): Category, Advertiser ID, Historical CTR, Creative size.
Context: Publisher ID (Site/App), Time of Day, IP-derived Location.
Outputs: A calibrated probability score [0, 1].
ML Challenges: High cardinality features (User ID, Publisher ID), extreme class imbalance, delayed labels, and non-stationary data (trends change hourly).

Design Summary & MVP

Concise Summary: A two-stage ranking system using a Feature Store for real-time lookups. First, a boolean filter narrows ads by targeting; second, a Logistic Regression or Factorization Machine predicts P(click) and P(conv) to calculate the final bid.
Model Architecture:
Baseline: Logistic Regression (LR) with Hashing Trick. Extremely fast, handles high-cardinality sparse IDs well.
Target Model: Field-aware Factorization Machines (FFM) or a small Wide & Deep model. FFMs capture feature interactions (e.g., User Category \times Publisher) without the latency of deep layers.
Simplicity Audit: I am avoiding Transformers or heavy Graph Neural Networks. In RTB, every millisecond of latency is lost revenue. LR/FFM on quantized features is the industry standard for MVP.

System Architecture

Pipeline Deep Dive

Data Pipeline

Data Source:
Bid Requests: Inbound JSON via HTTP from exchanges.
Feedback: Win notices (UDP or Pixel), Click/Conv redirects.
Data Ingestion: Kafka for high-throughput buffering. Use Protobuf for schema efficiency.
Data Storage: S3/Iceberg for the Data Lake (cost-effective storage). Parquet format for columnar reads.
Data Processing: Flink for real-time aggregation (e.g., "how many times has this user seen this ad in the last hour?").

Feature Pipeline

User Features: Pre-computed user embeddings (32-dim) stored in Redis. Real-time counters for frequency capping.
Item Features: Ad category, historical CTR (smoothed with Bayesian Prior), Advertiser ID.
Context Features: site_id, hour_of_day, geo_hash.
Feature Engineering:
Feature Hashing: To handle massive ID spaces (User IDs) without a huge dictionary.
Log-transforms: For counters to handle long-tail distributions.
Training/Serving Skew: Use a unified Feature Store (e.g., Feast) to ensure the same transformation logic is used in Spark (training) and the Go/Java scoring service (serving).

Model Architecture

Problem Formulation: Multi-task learning. Task 1: P(Click). Task 2: P(Conv|Click).
Model Choice: Factorization Machines (FM).
Why? It models the interaction between variables (User Category, Site) even if they haven't appeared together in training. It’s significantly faster than Deep Learning for inference (< 2ms).
Inference Optimization: Use Fixed-point arithmetic (Quantization) for model weights to fit in L3 cache.

Training Pipeline

Dataset Construction:
Positive Labels: Win + Click + Conv.
Negative Labels: Wins with no click.
Sampling: Ad impressions are 100:1 negative to positive. Downsample negatives by 10x and adjust the model intercept: w_{new} = w_{old} + \log(sampling\_rate).
Retraining: Daily batch training for the main weights; hourly incremental updates for the "bias" terms (to catch sudden spikes in traffic).

Serving Pipeline

Pattern: Request-Response with local caching.
Local Model Cache: The Bidding Engine loads the latest FFM weights every 5 minutes from the Registry into local memory to avoid network calls during inference.
Fallback: If the ML service times out (> 10ms), use a simple "Historical Average CTR" heuristic.

Evaluation Pipeline

Offline: AUC is the primary metric for ranking. For bidding, Calibration Plot and Brier Score are critical.
Online: A/B Testing using "Shadow Bidding" or "Bucketized Traffic". We measure the lift in CPA and total spend.

Monitoring Pipeline

System: Track inference_latency_ms and feature_fetch_latency.
Model: Monitor Prediction Drift. If the average P(click) drops by 20% suddenly, it usually indicates a data pipeline failure or a broken crawler on the exchange side.
Business: Win rate monitoring. A sudden drop in win rate usually means bids are too low (model is under-predicting).
Wrap Up

Final Evaluation

Feedback Loop: Win/Loss notifications are fed back to the pacing controller (PID) to adjust the bid_multiplier.
Edge Cases:
Cold Start: Use Thompson Sampling. Allocate 5% of traffic to "Explore" new ads by boosting their predicted CTR.
Budget Exhaustion: Use a high-priority Bloom Filter to quickly reject ads whose budgets are empty.
Trade-offs: Complexity (Deep Learning) vs. Latency (FFM). At 1M QPS, the infra cost of Deep Learning outweighs the 1% AUC gain. Stick to FFM.