Scalable Enterprise Document Classification System
Design a high-scale document classification system capable of processing 10 million diverse documents (PDFs, images, emails) per day for an automated business workflow. The system must categorize documents into 50+ classes with a P99 latency under 500ms. Your design should address the full ML lifecycle including OCR integration, handling long-form text, class imbalance in training data, and a strategy for ensuring online model reliability and monitoring for concept drift.
DistilBERTTransformersXGBoostOCRTesseractFastAPIKafkaSparkQuantizationSHAP
00