Unified ML Feature Store & Data Platform
Design a high-scale data platform capable of orchestrating both batch and streaming pipelines for a real-time recommendation system. The system must handle over 1M events per second, provide a unified interface for feature engineering to eliminate training-serving skew, and support low-latency (<10ms) feature retrieval for online inference. Detail the architecture for point-in-time correct training set generation, the integration of a feature store, and how you ensure data quality and reliability across the offline and online paths.
SparkFlinkKafkaIcebergRedisDebeziumGreat ExpectationsFeature StoreCDCPoint-in-time Joins
00