The Question
Design
Scalable Web Crawler Design
Design a distributed system capable of crawling and indexing billions of web pages. The system must efficiently handle URL discovery, ensure 'politeness' towards target servers, manage duplicate content, and provide a resilient storage solution for petabytes of raw data.
Kafka
Redis Bloom Filter
S3
PostgreSQL
Async I/O
February 20, 2026