Scalable Retrieval-Augmented Generation (RAG) System
Design a production-ready RAG system capable of indexing millions of enterprise documents and answering user queries with high relevance and low latency. The system must handle document ingestion asynchronously, support semantic search across billions of text chunks, and ensure strict document-level access control. Discuss your strategies for chunking, hybrid retrieval, re-ranking, and handling LLM service limits while maintaining a P99 latency under 2.5 seconds.
Vector DatabasePostgreSQLRedisS3SQSLLM APIHNSWHybrid SearchCross-EncoderJWT
01