DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Scalable Ads Click Aggregation System

Design a high-throughput system to aggregate ad clicks for real-time advertiser dashboards. The system must handle a peak of 1 million clicks per second, provide sub-minute latency for reports, and ensure exactly-once processing for accurate billing. Address challenges such as late-arriving data, hot-key distributions for viral ads, and high-availability storage for time-series aggregates.
Kafka
Apache Flink
Cassandra
Redis
NoSQL
API Gateway
Time-series Storage
Kappa Architecture
Questions & Insights

Clarifying Questions

What is the expected scale (QPS)? Assumption: The system should handle an average of 100,000 clicks per second, with peaks of up to 1,000,000 clicks per second.
What is the required data freshness? Assumption: Advertisers should see aggregated results within 10-30 seconds (near real-time).
What are the windowing requirements? Assumption: Support for 1-minute, 5-minute, and 1-hour fixed (tumbling) windows.
How critical is accuracy? Assumption: Extremely critical for billing purposes. We must implement exactly-once processing semantics to avoid over-counting or under-counting clicks.
What is the retention period for raw and aggregated data? Assumption: Raw data for 7 days (for auditing); aggregated data for 2 years.

Thinking Process

Core Bottleneck: Ingesting a massive volume of click events while ensuring they are not lost or double-counted during the aggregation phase.
Progressive Logic:
How do we ingest clicks without blocking the user? (Asynchronous ingestion via a Message Queue).
How do we handle massive scale and deduplication? (Distributed Stream Processing with stateful windowing).
How do we store time-series aggregates for efficient querying? (NoSQL or OLAP storage optimized for time-range queries).
How do we ensure exactly-once semantics? (Idempotent keys and transactional offsets).

Bonus Points

Watermarking for Late Events: Implementing a slack period (e.g., 5 seconds) to handle clicks that arrive slightly out of order due to network jitter, ensuring window completeness.
Two-Stage Aggregation: Reducing "hot-key" issues (e.g., a viral Super Bowl ad) by performing local in-memory pre-aggregation at the ingestion layer or within the stream processor before sending to the global aggregator.
Lambda/Kappa Trade-off: Justifying a Kappa architecture for simplicity in MVP, while acknowledging that a batch layer (Lambda) could be added later for deep historical auditing and correction.
Exactly-Once with Kafka-Flink-Cassandra: Detail the use of Kafka transaction offsets and Flink's two-phase commit sink to achieve end-to-end consistency.
Design Breakdown

Functional Requirements

Core Use Cases:
Capture ad click events from mobile/web clients.
Aggregate clicks per Ad ID and Campaign ID over specific time windows.
Provide an API for internal dashboards to query aggregated click counts.
Scope Control:
In-Scope: Click ingestion, real-time aggregation, and storage of aggregates.
Out-of-Scope: Ad serving logic, fraud detection (though we will provide hooks for it), and billing payment processing.

Non-Functional Requirements

Scale: Support 1M peak QPS.
Latency: End-to-end latency < 30 seconds for aggregations.
Availability & Reliability: 99.99% availability; data must not be lost once acknowledged by the ingestion API.
Consistency: Exactly-once processing for billing accuracy.
Fault Tolerance: Automatic recovery from node failures in the stream processing cluster without data duplication.
Security & Privacy: TLS for all data in transit; PII scrubbing for click metadata.

Estimation

Traffic Estimation:
Average QPS: 100k clicks/sec.
Peak QPS: 1M clicks/sec.
Storage Estimation:
Aggregated Record: ad_id (16 bytes) + timestamp (8 bytes) + count (8 bytes) + metadata (~68 bytes) = ~100 bytes.
If we have 10k active ads and 1-minute windows: 10k * 1,440 mins/day = 14.4M records/day.
14.4M * 100 bytes = 1.44 GB/day for aggregated data.
Raw data: 1M peak QPS * 100 bytes = 100 MB/sec peak = 8.6 TB/day (requires retention management).
Bandwidth Estimation:
Inbound: 1M clicks/sec * 500 bytes (raw request) = 500 MB/sec.
Outbound (API Query): Negligible compared to ingestion.

Blueprint

Concise Summary: A Kappa-architecture pipeline where clicks are ingested via an API Gateway into Kafka, processed in real-time using Apache Flink for windowed aggregation, and stored in Cassandra for time-series querying.
Major Components:
Ingestion Service: Stateless microservice to validate and produce click events to Kafka.
Kafka: Distributed message bus acting as a buffer and source of truth for the stream processor.
Apache Flink: Stateful stream processing engine for windowing and deduplication.
Redis: Distributed cache for temporary click deduplication (idempotency check) before ingestion.
Cassandra: Wide-column store optimized for high-write volume and time-series aggregation queries.
Simplicity Audit: This design avoids complex batch-processing pipelines (MapReduce) by using a single stream processing path (Flink) which is sufficient for 10-30 second latency requirements.
Architecture Decision Rationale:
Why this architecture?: Kafka/Flink/Cassandra is the industry standard for high-throughput, exactly-once time-series aggregation.
Functional Requirement Satisfaction: Meets near real-time aggregation and query requirements via windowed processing and optimized storage.
Non-functional Requirement Satisfaction: Flink provides horizontal scaling and exactly-once state management; Kafka ensures no data loss during spikes.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Global DNS (Route53) redirects traffic to the nearest regional Load Balancer to minimize click latency.
Security & Perimeter:
API Gateway: Handles TLS termination and basic request validation.
Rate Limiting: Applied at the API Gateway level to prevent DDoS attacks from botnets.

Service

Topology & Scaling:
Ingestion Service: Stateless, deployed in multiple Availability Zones (AZs). Scales based on CPU and Incoming Request count.
Query Service: Read-only service scaling based on QPS for dashboards.
API Schema Design:
Endpoint: POST /v1/clicks
Payload: { "ad_id": "string", "timestamp": "long", "user_id": "string", "request_id": "uuid" }
Idempotency: request_id is used to prevent duplicate clicks within a short window.
Resilience & Reliability:
Retries: Client-side retries with exponential backoff if the Ingestion Service is unavailable.
Circuit Breakers: Implemented between the Ingestion Service and Kafka to prevent cascading failures.

Storage

Access Pattern: High-frequency writes from Flink (aggregated results); low-frequency, high-volume reads (range scans) from the Query Service.
Database Table Design:
Table: aggregated_clicks
Partition Key: ad_id, window_type (e.g., 1m, 1h)
Clustering Key: window_start_time (Descending)
Columns: click_count, unique_user_count (optional).
Technical Selection: Cassandra.
Rationale: Excellent for time-series data where writes are dominant. Horizontal scaling allows for seamless growth as ad volume increases.
Distribution Logic: Partitioning by ad_id ensures all data for a specific ad is colocated for fast dashboard queries, but we use a synthetic shard key if one ad_id becomes too "hot".

Cache

Purpose & Justification: Used for Deduplication. To ensure we don't count the same request_id twice within a 1-minute window (e.g., double clicks).
Key-Value Schema:
Key: click_dedup:{request_id}
Value: 1
TTL: 60 seconds.
Technical Selection: Redis.
Rationale: In-memory speed is required for the high-frequency deduplication check before writing to Kafka.

Messaging

Purpose & Decoupling: Kafka decouples the high-velocity ingestion from the complex stream processing. It provides a persistent buffer during Flink maintenance or outages.
Event / Topic Schema:
Topic: raw_clicks
Partition Key: ad_id (Ensures ordering per ad for the aggregator).
Throughput & Partitioning: 100 partitions per topic to allow for 100+ concurrent Flink consumers.
Technical Selection: Kafka.
Rationale: High throughput, low latency, and strong durability.

Data Processing

Processing Model: Streaming (Apache Flink).
Processing DAG:
Source: Read from Kafka raw_clicks.
Map: Parse JSON and extract timestamp.
Watermark: Assign watermarks based on timestamp with a 5-second delay.
Window: keyBy(ad_id).window(TumblingEventTimeWindows.of(Time.minutes(1))).
Aggregate: Sum clicks per ad.
Sink: Upsert results to Cassandra.
Correctness Guarantees: Flink's Checkpoints and Savepoints enable exactly-once processing by persisting state (partial aggregates) to durable storage (S3).
Technical Selection: Apache Flink.
Rationale: Native support for event-time processing and complex windowing.

Infrastructure (Optional)

Observability:
Metrics: Prometheus tracking "Consumer Lag" in Kafka (crucial indicator of system health).
Logging: Structured logs for click events to allow for post-hoc auditing.
Wrap Up

Advanced Topics

Trade-offs (PACELC): We choose Availability over Consistency (AP) for the ingestion layer (clicks must always be accepted), but we enforce Strong Consistency within the processing layer (Flink state) to ensure billing accuracy.
Reliability: Flink's exactly-once sink to Cassandra is implemented via "Idempotent Writes" (using the window start time and Ad ID as a unique primary key). If Flink restarts and re-processes a window, it simply overwrites the same record in Cassandra with the same value.
Hot Spot Optimization: For viral ads, we can implement sub-keying. Instead of grouping by ad_id, we group by ad_id + random_shard_id, aggregate, and then run a second-stage aggregation to sum the shards.
Security: Request signatures (HMAC) from the ad-client to the Ingestion Service to prevent spoofed clicks.