The Question
Design
Scalable Web Crawler Design
Design a distributed system capable of crawling and indexing billions of web pages. The system must efficiently manage URL discovery, ensure politeness to host servers, handle content deduplication, and store massive amounts of unstructured data while maintaining high horizontal scalability.
Kafka
Redis Bloom Filter
S3
Cassandra
Distributed Workers
February 16, 2026