Scalable Unified Data Platform (Batch & Streaming)
Design a high-throughput data platform capable of processing 100TB of daily data via both batch and streaming jobs. The system must support 'Exactly-Once' processing semantics, unified metadata management for cross-job consistency, and a 'Lakehouse' storage model to handle ACID transactions on object storage. Address how the platform will manage the 'small files' problem inherent in streaming ingestion and ensure cost-efficient resource scaling during peak loads.
Apache SparkApache KafkaApache IcebergS3KubernetesApache AirflowTrinoAvroPrometheusOpenLineage
00