The Question
Design

Ad Click Aggregation Pipeline

Design a distributed, real-time system to ingest and aggregate advertisement clicks at a scale of billions of daily events. The system must support exactly-once processing for accurate billing, handle out-of-order events using time-windowed aggregations, and provide low-latency query access for advertiser performance dashboards.
Kafka
Flink
ClickHouse
Redis
Protobuf
RocksDB
Questions & Insights

Clarifying Questions

Scale and Throughput: What is the expected volume of clicks per day? (Assumption: 1 billion clicks per day, peaking at 100,000 QPS).
Latency SLA: How quickly should a click be reflected in the aggregated results? (Assumption: Near real-time, within 1 minute).
Accuracy Requirements: Is "exactly-once" processing required for billing purposes, or is "at-least-once" acceptable for general analytics? (Assumption: Exactly-once is required to prevent over-billing).
Aggregation Dimensions: What are the primary dimensions for aggregation? (Assumption: Ad ID, Advertiser ID, Campaign ID, and Timestamp windows of 1-minute, 5-minutes, and 1-hour).
Data Retention: How long should raw vs. aggregated data be stored? (Assumption: Raw data for 7 days for auditing; aggregated data for 2 years).

Thinking Process

The Core Bottleneck: Handling high-write throughput of clicks while ensuring every click is counted exactly once despite network failures or service restarts.
Ingestion Strategy: How do we decouple the high-volume click stream from the heavy aggregation logic? (Use a distributed message bus).
Stateful Processing: How do we maintain counters over time windows without losing state? (Use a stateful stream processing engine with checkpointing).
Storage Choice: How do we store time-series aggregates for fast retrieval by advertisers? (Use an OLAP-optimized columnar database).

Bonus Points

Idempotency via Deterministic Keys: Implementing a deduplication strategy using a combination of impression_id and event_type stored in a sliding window Bloom filter or KV store to handle late-arriving duplicate clicks.
Watermarking for Late Data: Using Flink watermarks to handle out-of-order events, ensuring aggregation windows only close when we are confident all data has arrived.
Lambda Architecture for Backfills: Maintaining a separate batch path (S3 + Spark) to re-process historical data in case of logic bugs or requirement changes in the real-time stream.
Cost-Optimized Tiering: Moving aggregated data older than 30 days from ClickHouse hot storage to cold S3-backed tables to reduce operational costs.
Design Breakdown

Functional Requirements

Capture ad click events from globally distributed users.
Aggregate clicks by Ad ID and Campaign ID over 1-minute, 5-minute, and 1-hour windows.
Provide an API for advertisers to query click statistics for their campaigns.
Ensure deduplication of clicks (e.g., accidental double-clicks).

Non-Functional Requirements

Availability: 99.99% for ingestion (clicks must not be lost).
Scalability: Must handle 10x spikes during major events (e.g., Black Friday).
Accuracy: Exactly-once processing semantics for financial integrity.
Low Latency: End-to-end processing delay under 30 seconds.

Estimation

Traffic: 1B clicks / 86400s \approx 11,500 clicks/sec average. Peak \approx 100,000 clicks/sec.
Storage (Raw): 1 click \approx 500 bytes. 1B clicks/day \approx 500 GB/day. 7 days \approx 3.5 TB.
Storage (Aggregated): Assuming 1M active Ads. 1-minute aggregates \approx 1M rows/minute. With compression, \approx 10-20 GB/day.
Bandwidth: 100k QPS * 500 bytes \approx 50 MB/s (easily handled by standard clusters).

Blueprint

Concise Summary: A high-throughput ingestion pipeline using Kafka, processed by Flink for stateful windowed aggregation, and stored in ClickHouse for real-time analytical queries.
Major Components:
Click Gateway: Stateless REST service that validates clicks and produces events to Kafka.
Kafka: Distributed message bus providing durability and decoupling.
Flink Cluster: Stateful stream processor performing windowed aggregations and deduplication.
ClickHouse: Columnar OLAP database for high-speed aggregate queries.
Query API: Serves aggregated data to frontend dashboards.
Simplicity Audit: This design avoids complex custom microservices for counting by leveraging Flink’s native windowing and ClickHouse’s native aggregation, minimizing "reinventing the wheel."
Architecture Decision Rationale:
Why this architecture is the best for this problem?: The combination of Kafka and Flink is the industry standard for "exactly-once" stream processing at scale.
Functional Requirement Satisfaction: Flink handles the time-windowed aggregation, while ClickHouse handles the multi-dimensional querying.
Non-functional Requirement Satisfaction: Kafka provides the buffer for scalability, and Flink’s checkpointing ensures the required accuracy/availability.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling:
Stateless gateway nodes deployed across multiple Availability Zones (AZs).
Scaling signal: CPU utilization and Request Latency (P99).
Load Balancing: L7 (NGINX/Envoy) to handle SSL termination and path-based routing.
API Schema Design:
POST /v1/clicks:
Payload: {ad_id, user_id, impression_id, timestamp, signature}.
Idempotency: impression_id acts as the unique identifier.
Rate Limit: 100 requests/sec per IP to prevent basic DoS.
Resilience & Reliability:
Retries with exponential backoff on the client-side for Gateway calls.
Circuit breakers on the Gateway to Kafka connection to prevent cascading failure if Kafka is slow.
Observability:
RED metrics (Rate, Errors, Duration) for all API endpoints.
Distributed tracing using OpenTelemetry to track a click from Gateway to Flink.

Storage

Access Pattern:
Write-heavy (bulk inserts from Flink).
Read-heavy for time-series queries (filtering by ad_id and timerange).
Database Table Design:
aggregated_clicks:
ad_id (UInt32), campaign_id (UInt32), window_start (DateTime), click_count (AggregateFunction).
Primary Key: (campaign_id, ad_id, window_start).
Technical Selection:
ClickHouse: Columnar storage is 100x faster for "SUM/COUNT" queries compared to Row-based DBs like Postgres at this scale.
Distribution Logic:
Sharding by campaign_id to ensure all data for one advertiser sits on the same shard for faster local joins/aggregations.

Cache

Purpose & Justification: Deduplication at the edge. To prevent immediate "double-clicks" from hitting the heavy stream processor.
Key-Value Schema:
Key: dedup:{impression_id}.
TTL: 24 hours (most duplicates happen within seconds).
Technical Selection: Redis (Cluster mode) for sub-millisecond lookups.
Failure Handling: If Redis is down, the system fails open (accepts clicks) and relies on Flink’s internal state for secondary deduplication.

Messaging

Purpose & Decoupling: Buffers traffic spikes and allows the Flink cluster to be updated without losing data.
Event / Topic Schema: Protobuf schema for minimized payload size. ClickEvent { string ad_id; int64 ts; string imp_id; }.
Throughput & Partitioning:
Partition key: ad_id to ensure ordering for specific ads (though windowing is more important than strict ordering here).
Technical Selection: Kafka. High throughput and long-term retention support.

Data Processing

Processing Model: Flink Streaming with Tumbling Windows (1m, 5m).
Processing DAG: Source (Kafka) -> Map (Extract keys) -> KeyBy (AdID) -> Window (1m) -> Reduce (Counter) -> Sink (ClickHouse).
Correctness Guarantees: Flink "Exactly-Once" using two-phase commit sinks to ClickHouse and RocksDB state backend for checkpoints.
Technical Selection: Apache Flink for its superior handling of state and time.
Wrap Up

Advanced Topics

Trade-offs: We choose Consistency (Accuracy) and Partition Tolerance over Availability (CAP theorem) in the Flink layer because over-counting clicks leads to financial disputes.
Bottleneck Analysis: ClickHouse write saturation. Mitigation: Use Asynchronous Inserts or Flink-side buffering to write in large batches (e.g., 100k rows at a time).
Hot Spot Shards: Popular ads (e.g., Super Bowl ads) can cause "Hot Keys" in Kafka/Flink. Optimization: Use "Salted Keys" or two-stage aggregation (Local-Global aggregation) to spread the load.
Security: All click URLs should be signed (HMAC) to prevent "Bot-driven" click inflation. The Gateway verifies signatures before pushing to Kafka.