Name: InterviewGPT
Rating: 4.8 (100 reviews)

The Question

Shared LLM Inference Platform

Design a shared, external-facing platform for serving large language model inference. The system should support multi-tenancy with per-tenant rate limiting and cost attribution, ensure low-latency responses under high concurrency, and maintain high availability with graceful degradation.

API Gateway

Server-Sent Events (SSE)

Redis Rate Limiting

Kafka

PostgreSQL

February 8, 2026