DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
ML Design

Large-Scale Privacy Redaction for Geospatial Imagery

Design a machine learning system to automatically detect and blur sensitive PII, specifically faces and license plates, in a global corpus of billions of high-resolution 360-degree street-level images. The system must prioritize near-perfect recall for privacy compliance while maintaining high precision to preserve the utility of the maps. Your design should detail a distributed batch-processing architecture, handle geometric distortions in spherical imagery, and explain how to manage global variations in license plate formats and hard negatives like statues or billboards. Address the trade-offs between processing throughput, inference cost, and model accuracy at petabyte scale.
YOLOv8
Apache Beam
Feature Pyramid Network
TensorRT
Quantization
Active Learning
Inpainting
Gaussian Blur
CNN
Questions & Insights

Clarifying Questions

Business Goal: Is the primary goal legal compliance (privacy) or user trust? (Target: 100% recall for privacy while minimizing over-blurring of landmarks/signs).
Constraints & Scale: What is the scale of imagery? (Assumption: 10+ PB of raw imagery, billions of faces/plates annually). What is the throughput requirement? (Assumption: Process imagery within 24-48 hours of ingestion).
Edge Cases: How do we handle "false" faces (statues, billboards) or non-sensitive plates (store signs)? (Assumption: These should not be blurred to preserve map utility).
Image Format: Are these raw 360-degree equirectangular images or pre-cut tiles? (Assumption: High-resolution equirectangular images that require tiling for GPU memory efficiency).
Assumptions:
Corpus: Billions of images globally.
Latency: Throughput-optimized batch processing (not real-time).
Accuracy: Recall is the "North Star" (must not miss a face).

Thinking Process

Identify the Core Bottleneck: The sheer volume of pixels. Processing high-res 360 images directly is computationally prohibitive. I need a "tiling and multi-scale" approach.
Retrieval vs. Ranking: In this context, it's Detection vs. Segmentation. Detection (Bounding Boxes) is faster and sufficient for blurring.
Scale Strategy: This is a classic "embarrassingly parallel" problem. I will use a distributed batch inference pipeline (Apache Beam/Dataflow) rather than a request-response API.
Quality Control: Automated blurring is prone to drift. I need a "Human-in-the-loop" (HITL) for high-uncertainty cases and a robust regression suite.

Elite Bonus Points

Geometric Distortion Awareness: Standard CNNs struggle with equirectangular distortion (objects near poles look stretched). I would implement "spherical tiling" or coordinate-aware convolutions.
Temporal/Spatial Consistency: If a face appears in three consecutive frames as the car moves, but the model only detects it in two, we can use "Motion Interpolation" or "Spatio-temporal tracking" to fill the gap and blur the missed frame.
Edge-Case Active Learning: Using a "Hard Negative Mining" strategy to specifically train the model on statues, mannequins, and billboards to reduce false positives that ruin the "vibe" of Street View.
Privacy-Safe Evaluation: Creating a "Golden Dataset" where faces are already blurred/synthetic so that human annotators never see the raw PII (Personally Identifiable Information) during the evaluation phase.
Design Breakdown

Requirements

Product Goal: Automatically redact PII (faces and license plates) from Google Street View to comply with global privacy laws (GDPR, CCPA).
Success Metrics:
Online/Production Metrics: Recall (Percentage of PII blurred), Precision (avoiding blurring signs/landmarks).
Offline Metrics: mAP (mean Average Precision) at IoU 0.5, F1-score.
Guardrail Metrics: Inference cost per million images, Processing latency (Time-to-Map).
System Constraints: Massive storage (Petabytes), GPU-heavy workloads, variable image quality (weather/lighting).
Data Availability: Raw images from Street View cars, historical manually labeled data, synthetic data for rare plate formats.

ML Problem Framing

ML Task Type: Object Detection (2D Bounding Box detection).
Prediction Target: P(\text{class} | \text{bounding box, image}).
Inputs:
Image Features: RGB pixel data, GPS/Heading (for context), Camera Intrinsics.
Outputs: List of Bounding Boxes [x, y, w, h] with associated class (Face, Plate) and confidence score.
ML Challenges: High-resolution processing, extreme scale-variance (distant vs. close faces), and heavy class imbalance (most pixels are not PII).

Design Summary & MVP

Concise Summary: A distributed batch-processing pipeline that tiles 360-degree images, runs a high-performance Object Detection model (YOLO-based for speed/efficiency), and applies a Gaussian blur post-processing step on detected coordinates.
Model Architecture & Selection:
Baseline Model: Simple Haar Cascades or HOG-based detectors (Low accuracy).
Target Model: YOLOv8 or EfficientDet. These provide the best trade-off between inference speed (QPS) and mAP for small objects like distant license plates.
Choice Rationale: Single-stage detectors are significantly cheaper to run at Google scale than two-stage detectors (like Faster R-CNN) while reaching comparable recall.
Simplicity Audit: No need for real-time inference or complex Transformers. A optimized CNN-based detector on a distributed batch runner satisfies all requirements.

System Architecture

Pipeline Deep Dive

Data Pipeline

Data Source: Massive ingestion from Street View cars. Metadata includes GPS, timestamp, and camera orientation.
Data Ingestion: Use Apache Kafka to buffer ingestion events. Images are stored in a distributed blob store (GCS/S3) partitioned by geographic S2 cells for spatial locality.
Data Processing: Equirectangular images are projected into multiple rectilinear tiles (pinhole camera views) to remove distortion, which significantly improves detection accuracy for standard CNNs.

Feature Pipeline

Feature Engineering: Standardize all tiles to a fixed resolution (e.g., 640x640). Apply color space normalization.
Offline Feature Pipeline: Batch jobs compute image brightness/contrast metadata to adjust detection thresholds (e.g., more sensitive in low-light/night images).
Training/Serving Skew: Use a unified preprocessing library (C++/TensorFlow Transform) to ensure tiles are generated identically during training and batch inference.

Model Architecture

Problem Formulation: Supervised Object Detection.
Candidate Model Families:
YOLOv8: Best for speed/latency.
Faster R-CNN: Better for very small objects but 5x slower.
Architecture Design: YOLOv8 with a CSP-Darknet53 backbone and an FPN (Feature Pyramid Network). The FPN is critical because license plates can vary from 20 pixels to 500 pixels in width.
Optimization: Use TensorRT for GPU inference acceleration and INT8 quantization to reduce compute costs by ~3-4x without significant recall loss.

Training Pipeline

Dataset Construction: Focus on "Hard Negatives." We include images of statues, printed faces on buses, and "false" plates (signs) to teach the model what not to blur.
Data Splitting: Split by Location/City, not just randomly. This prevents the model from "memorizing" specific static objects seen in both train and test sets.
Retraining Strategy: Triggered monthly or when new countries are added (since license plate designs vary globally).

Serving Pipeline

Serving Pattern: Batch Inference using Apache Beam/Dataflow. This allows for massive horizontal scaling across thousands of GPU workers.
Latency Optimization: Request Batching. Accumulate tiles to fill GPU memory for maximum throughput.
Reliability: If a tile fails, retry. If it fails 3 times, send the whole 360 image to a "safeguard" queue for manual review to ensure privacy compliance.

Evaluation Pipeline

Offline Evaluation: Use mAP@0.5 and a custom Privacy Recall metric (specifically measuring the % of plates/faces with confidence > threshold that were missed).
Online Evaluation: Conduct a "Privacy Audit" on a random 1% sample of published images, where a human auditor checks for unblurred PII.

Monitoring Pipeline

Data Monitoring: Track "Detection Density." If a specific geographic region suddenly shows 0 detections, the pipeline might be broken or the camera obscured.
Model Monitoring: Monitor the distribution of confidence scores. A shift to the left indicates "Model Decay" (e.g., a new license plate design was introduced).
Wrap Up

Final Evaluation

Observability: Use dashboards to track the "Redaction Rate" per country.
Feedback Loop: Hard examples identified by auditors are fed back into the training set (Active Learning).
Edge Cases:
Cold Start: For new countries, use a "Generative Data" approach to synthesize that country's license plates onto existing images for initial training.
Over-blurring: High-precision thresholding is used for "Known Landmarks" to prevent blurring the Statue of Liberty's face.
Trade-offs: Recall vs. Precision. In privacy, we always bias toward Recall. If the model is 51% sure it's a face, we blur it.