Universal LLM Gateway & Multi-Tenant Proxy Service
Design a high-scale, external-facing LLM platform that provides a unified API for multiple foundation models (e.g., OpenAI, Anthropic, Llama). The system must support multi-tenant API key management, streaming responses via SSE/Websockets, real-time token-based quota enforcement, and asynchronous usage billing. Address specific challenges regarding provider rate-limit management, model-agnostic routing, and minimizing latency overhead for streaming traffic at a scale of 10M+ daily requests.
RedisKafkaPostgreSQLClickHouseFlinkSSEEnvoyvLLMgRPCPrometheus
00