The Question
Design
Scalable Web Crawler Design
Design a distributed system capable of crawling and indexing a significant portion of the public internet. The system must handle politeness constraints, efficiently deduplicate billions of URLs, and provide a fault-tolerant mechanism for storing massive amounts of raw web content and its associated metadata.
Kafka
Redis Bloom Filter
Cassandra
S3
Distributed Workers
February 19, 2026