The Question
Design
Distributed Web Crawler
Design a distributed web crawler for large-scale content indexing. The system should coordinate URL discovery and fetching across many worker nodes, enforce politeness constraints, handle duplicate detection, and scale to index billions of pages reliably.
PostgreSQL
S3
HDFS
Redis
Kafka
January 30, 2026