The Question

Nearby Friends System

Design a real-time 'Nearby Friends' feature for a social media application with 10M DAU. The system must allow users to see their friends' locations on a map within a configurable radius (e.g., 5km) and update their own location every 30 seconds. Key constraints include high write throughput for location updates, low latency for proximity queries, strict mutual-friendship privacy, and battery efficiency for mobile clients. Address how to scale the spatial indexing and handle the intersection of location data with social graph data.

Redis

PostgreSQL

Kafka

Geohash

WebSockets

API Gateway

Questions & Insights

Clarifying Questions

What is the scale of the system (DAU and concurrent users)?

Assumption: 10 million Daily Active Users (DAU) with 1 million peak concurrent users.

How frequently should location updates be sent?

Assumption: Every 30 seconds while the app is active, or whenever the user moves significantly (>100 meters).

What is the definition of "nearby" and what is the maximum radius?

Assumption: Users within a 5km to 10km radius.

What are the privacy requirements?

Assumption: Mutual friendship is required. Users can opt-out (Ghost Mode). History is only visible to the owner.

Is real-time push required, or is pull-based refresh acceptable?

Assumption: Pull-based for the list (on app open/refresh), but WebSockets for "active" users looking at the map.

Thinking Process

Core Bottleneck: High-frequency write-heavy location updates (33k+ QPS) combined with spatial "proximity" queries.

Key Strategy:

Use Geohashing or Redis Geo-commands (ZSet based) to handle the spatial index.

Decouple the location write path from the read path using an asynchronous buffer for history.

Solve the "Nearby Friend" problem by fetching a user's friend list first, then retrieving their locations from the spatial index to ensure privacy and efficiency.

How do we store and update the live location of millions of users efficiently?

How do we quickly find which of those users are within a specific circular or rectangular area?

How do we intersect "nearby users" with the "friend list" without a massive performance penalty?

How do we scale the system globally while maintaining low latency?

Bonus Points

Geohash Precision & Shifting: Discussing the use of Geohash (base32) and the "border problem" (querying 8 neighboring cells to avoid missing users on the edge of a grid).

Privacy Cloaking: Implementing "fuzzy" location sharing or grid-based snapping for users who want privacy but still want to see "general" nearby friends.

Adaptive Update Frequency: Throttling location updates based on battery level, movement speed (e.g., driving vs. walking), and signal strength.

Pub/Sub for Live Tracking: Using Redis Pub/Sub to push updates to WebSockets only for friends who are currently "active" and looking at each other on the map.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can update their current GPS coordinates.

Users can view a list of friends within a specified radius.

Users can enable/disable location sharing (Ghost Mode).

Scope Control:

In-scope: Real-time location sharing, friendship-based filtering, and basic proximity search.

Out-of-scope: Location history playback (long-term), navigation/routing, and "people you may know" recommendations.

Non-Functional Requirements

Scale: Support 10M DAU and peak 33k write QPS.

Latency: Location updates should be processed in <200ms; "nearby" queries should return in <500ms.

Availability & Reliability: High availability (99.9%); losing a single location update is acceptable (ephemeral data), but friendship data must be persistent.

Consistency: Eventual consistency is sufficient for location (a few seconds lag is okay). Strong consistency for privacy settings.

Security & Privacy: Strict mutual-friendship checks and TLS encryption for coordinates.

Estimation

Traffic Estimation:

1M concurrent users updating every 30s = ~33,333 Write QPS.

If users refresh the "nearby list" every 1 minute = ~16,666 Read QPS.

Storage Estimation:

Redis (Live Location): UserID (8 bytes) + Lat/Lon (16 bytes) + Timestamp (8 bytes) = 32 bytes per user. 10M users = ~320MB (easily fits in one Redis node, but sharded for high QPS).

PostgreSQL (Friendships): 10M users 200 friends 16 bytes (pair) = 32GB storage.

Bandwidth Estimation:

Incoming: 33k QPS * 100 bytes/request = 3.3 MB/s.

Outgoing: 16k QPS (up to 50 friends 100 bytes) = 80 MB/s.

Blueprint

Concise Summary: A dual-path architecture where location updates are stored in a Redis Geo-index for fast spatial retrieval, while friendship data is managed in a relational database.

Major Components:

Location Service: Handles incoming GPS pings and updates the Redis Geo-index.

Nearby Service: Aggregates friendship data and performs spatial queries.

Friendship Service: Manages the social graph and privacy settings.

Redis (Spatial): Acts as the primary index for real-time locations using GEOADD and GEORADIUS.

Kafka: Buffers location updates for asynchronous persistence of history.

Simplicity Audit: This is the simplest design because it leverages Redis's built-in spatial primitives, avoiding complex custom Quadtree implementations or slow SQL spatial joins for the MVP.

Architecture Decision Rationale:

Why this?: Redis is in-memory and optimized for high-write/low-latency spatial operations.

Functional Satisfaction: Meets the core requirement of finding friends within a radius.

Non-functional Satisfaction: Scalable via Redis sharding and horizontally scalable stateless microservices.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: DNS-based routing to the nearest regional data center to minimize latency for GPS pings.

Security & Perimeter:

API Gateway: Handles JWT authentication.

Rate Limiting: Limits location updates to 1 per 5 seconds per user to prevent DDoS and battery drain.

Service

Topology & Scaling:

Stateless services deployed in multiple Availability Zones.

Scaled based on CPU and Request Count.

API Schema Design:

POST /v1/locations:

Request: {lat: float, lon: float, accuracy: float}

Response: 202 Accepted

GET /v1/nearby?radius=5000:

Response: {friends: [{id: string, lat: float, lon: float, distance: float, last_seen: timestamp}]}

Resilience & Reliability:

Retry Policy: Clients use exponential backoff for failed location updates.

Circuit Breaker: Nearby service trips if Friendship service is down, returning a "Service Unavailable" or cached friend list.

Storage

Access Pattern:

Location: 90% writes, 10% reads.

Friendship: 99% reads, 1% writes.

Database Table Design (PostgreSQL):

UserProfiles: user_id (PK), username, ghost_mode_enabled.

Friendships: user_id_1, user_id_2, status (Composite PK).

Technical Selection:

PostgreSQL: For friendships (Relational, ACID for privacy settings).

S3/ClickHouse: For long-term location history (Optional, via History Worker).

Distribution Logic:

Shard Friendship table by user_id to ensure a user's friends are likely on the same shard.

Cache

Purpose & Justification: Redis is the primary engine for spatial data, not just a cache. It reduces read amplification for nearby searches.

Key-Value Schema:

Key: user_locations (Type: GeoSet/ZSet).

Value: lat, lon, user_id.

Key: friend_list:{user_id} (Type: Set).

Technical Selection: Redis. Rationale: Native GEOSEARCH support and sub-millisecond response times.

Failure Handling: If Redis fails, use a standby replica. Since data is ephemeral (location changes in seconds), high-frequency snapshots aren't critical.

Messaging

Purpose & Decoupling: Kafka decouples the critical "update location" path from the "persist history" path.

Event / Topic Schema: location-updates: {user_id, lat, lon, timestamp}.

Technical Selection: Kafka. Rationale: High throughput and durability for data logging.

Wrap Up

Advanced Topics

Trade-offs: We chose Availability over Consistency (AP in CAP). If a location update is slightly delayed, the system still functions.

Reliability: Use a "Last Known Location" cache if the real-time index is momentarily unreachable.

Bottleneck Analysis:

Hot Spots: Popular users with 5000 friends.

Optimization: For very active users, the Nearby Service can cache the "Friend List" to avoid hitting PostgreSQL on every location query.

Privacy Optimization: Use Geohash prefixes to narrow down search spaces. Instead of searching the whole world, only search the user's current Geohash cell and its 8 neighbors.

Distinguishing Insights: For a true "Staff Level" design, discuss Cell-based Pub/Sub. Instead of the client polling, the client subscribes to a Geohash-based topic. When a friend enters that Geohash, the server pushes an update. This significantly reduces mobile battery consumption compared to polling.