The Question
Design
Scalable Web Crawler Design
Design a distributed system capable of crawling billions of web pages. The system must efficiently discover new URLs, handle deduplication, store massive amounts of raw content, and strictly adhere to web politeness protocols and rate limits.
Kafka
Redis Bloom Filter
S3
Distributed Workers
DNS Cache
February 21, 2026