The Question
ML Design

Real-time Financial Forecasting System

Design a large-scale machine learning system to provide real-time, high-frequency price movement signals for a global trading platform. The system should handle millions of active users, ingest diverse data sources like market ticks and news sentiment, and maintain extreme low latency while addressing the unique challenges of financial time-series data such as non-stationarity and backtesting integrity.
LSTM
Time-Series Transformer
XGBoost
ARIMA
Reinforcement Learning
Questions & Insights

Clarifying Questions

Business Goal: Is the goal to provide real-time signals for retail traders (high QPS) or to drive an automated high-frequency execution engine (ultra-low latency)? Assumption: A real-time signal generation platform for a high-scale retail brokerage.
Constraints & Scale:
Users: 10M DAU.
Assets: 100,000 symbols (Stocks, ETFs, Crypto).
Latency: Signal generation (Inference) < 20ms P99.
Throughput: Support 500k events/sec (market ticks).
Edge Cases: How to handle "Flash Crashes," halted symbols, and IPOs (cold start)? How do we deal with non-stationarity (market regimes changing)?
Assumptions: I assume we are predicting the Log-Return over a 5-minute horizon (regression) rather than the raw price, as returns are more stationary. I also assume we have access to Level-2 Order Book data and a news sentiment stream.

Thinking Process

Identify the Core Challenge: Financial data has a very low signal-to-noise ratio and is non-stationary. The system must prioritize feature freshness and robust backtesting over complex model depth.
Retrieval vs. Ranking: Unlike RecSys, this is a Time-Series Forecasting problem. The "retrieval" phase is essentially selecting which tickers a user is watching, and "ranking" is replaced by signal confidence.
Scaling Strategy: Use a Lambda architecture or a Kappa architecture for features. We need a high-performance Stream Processing engine (Flink) to calculate technical indicators (RSI, Moving Averages) in real-time.
Validation: Traditional K-fold cross-validation is illegal here due to temporal leakage. We must use Sliding Window Time-Series Validation.

Elite Bonus Points

Handling Non-Stationarity: Implement Online Learning or frequent "Champion-Challenger" model updates to adapt to market regime shifts (e.g., Bull to Bear).
Fractional Differencing: Use fractional calculus to make features stationary while preserving "memory" of the price series (better than simple integer differencing).
Adversarial Training: Use a GAN or targeted noise injection to make the model robust against market manipulation or sudden volatility spikes.
Feature Neutralization: Linearize the relationship between features and known risk factors (Market Cap, Sector) to ensure the model finds "Alpha" rather than just betting on the Beta of the market.
Design Breakdown

Functional Reqs

Real-time Prediction: Users receive a price movement signal (Up/Down/Neutral) and a confidence score for any symbol in their watchlist.
Alerting: Trigger notifications when the predicted volatility or price movement exceeds a threshold.
Explainability: Provide a brief "Why" (e.g., "High Order Book Imbalance" or "Positive News Sentiment").

Non-Functional Reqs

Latency: End-to-end (Market Event → Prediction) must be < 50ms.
Availability: 99.99% (Financial markets require high uptime during trading hours).
Freshness: Features must reflect data from the last 100ms.
Consistency: Training data (offline) and Serving data (online) must be identical to avoid "Look-ahead bias."

ML Problem Framing

ML Objective: Predict the k-step forward log-return: r_{t+k} = \ln(P_{t+k} / P_t).
ML Category: Regression (for return) or Multi-class Classification (Up, Down, Flat).
Input: Historical price ticks, Order Book depth, News sentiment, Macro-economic indicators.
Output: \hat{y} \in \mathbb{R} (Predicted return) and \sigma (Confidence/Volatility).
Label: The realized return 5 minutes after the prediction, adjusted for transaction costs (spread).

Data Prep & Features

Data Pipeline:
Market Data: UDP/Multicast feeds or WebSockets (Protobuf/Binary).
Alternative Data: Scraped news, Social media (Twitter/X), SEC filings.
Feature Engineering:
Technical: RSI, MACD, Bollinger Bands, VWAP (Volume Weighted Average Price).
Microstructure: Bid-Ask Spread, Order Book Imbalance (Symmetry of buy/sell orders).
Sequence Features: 1-minute, 5-minute, and 1-hour price windows.
Embeddings: Symbol embeddings (learned via Word2Vec style skip-grams based on co-movement).
Feature Store: Use an online-offline synchronized store (e.g., Tecton or Feast) to ensure the feature moving_avg_5m is calculated the same way in training and production.

Model Architecture

Baseline: Gradient Boosted Decision Trees (XGBoost/LightGBM) are excellent for tabular financial data due to their ability to handle non-linearities and missing values.
Advanced Model: Temporal Fusion Transformer (TFT).
Why?: It handles multi-horizon forecasting, uses Gated Residual Networks to skip noisy inputs, and includes attention mechanisms to focus on specific historical events (e.g., previous day's close).
Loss Function: Huber Loss (robust to outliers/outlier price spikes) or Sharpe-Ratio Maximization (Directly optimizing for risk-adjusted return).

Training & Serving

Optimization: Use a rolling window. Train on months 1-6, validate on month 7, test on month 8. Shift window by 1 month and repeat.
Serving: Export model to ONNX or TensorRT for low-latency C++ inference.
Position Bias/Feedback Loop: In trading, our own predictions might influence our trades, which influences the market. We must model the Market Impact to avoid "over-fitting to our own influence."

System Architecture

Pipeline Deep Dive

Data Pipeline

Ingestion: We use Kafka as the backbone. Market ticks (Trade/Quote) are high-velocity. We use a "Schema Registry" to ensure the Protobuf messages are versioned.
Storage: Raw data is stored in Parquet/Iceberg on S3, partitioned by Date and Symbol for efficient time-series retrieval.

Feature Pipeline

Streaming Features: Flink maintains a 24-hour state window to calculate technical indicators. For news, we use a small LLM (e.g., DistilBERT) to convert text into sentiment scores within the stream.
Training/Serving Skew: We use a "point-in-time" join in the Feature Store. This ensures that when we train on data from 10:00 AM yesterday, we only use features available at 09:59 AM.

Training Pipeline

Distributed Training: Given the volume of global market data, we use Horovod or PyTorch Distributed Data Parallel (DDP).
Objective Strategy: Instead of just predicting price, we train multi-task models: Task A (5m return), Task B (next 1h volatility). This regularizes the model and improves generalizability.

Serving Pipeline

Retrieval: When a user opens their app, the "Retrieval" step identifies symbols in their portfolio and watchlists.
Inference: The Prediction Service pulls pre-computed technical features from Redis and runs the TFT model.
Calibration: We use Platt Scaling or Isotonic Regression to ensure that a "70% confidence" prediction actually happens 70% of the time.

Evaluation Pipeline

Backtesting: This is the "Offline Evaluation." We simulate trades based on model signals. We track Information Ratio and Maximum Drawdown, not just RMSE.
Interleaved Testing: Instead of a 50/50 A/B split (which is hard in time-series), we use interleaved signals to see which model's ranking of "top movers" is more accurate.

Monitoring Pipeline

Feature Drift: We monitor the distribution of inputs (e.g., if market volatility doubles, our features shift). We use the Kolmogorov-Smirnov test to detect this.
Model Staleness: If the correlation between \hat{y} and y drops below a threshold, the system triggers an automated retraining job on the latest data.
Wrap Up

Advanced Topics

Offline Metrics: Spearman’s Rank Correlation (IC - Information Coefficient), Root Mean Squared Error (RMSE), and Precision at top K (for movement direction).
Online Metrics (North Star): Realized Alpha (excess return over benchmark) and User Engagement (do users trade more based on these signals?).
Failure Modes & Risk Mitigation:
Flash Crashes: Hard-coded circuit breakers to stop predictions if volatility exceeds 10 standard deviations.
Selection Bias: We only have data for symbols that survived. We must include delisted stocks in the training set to avoid Survivorship Bias.
Scalability: The system scales horizontally. Flink partitions by SymbolID, and the Inference Service is a stateless k8s deployment. A 10x traffic spike is handled by auto-scaling the inference pods and increasing Kafka partitions.