The Question
Design
ML Training & Evaluation Platform
Design a job-based platform for orchestrating machine learning training and model evaluation workloads. The system should support job submission, scheduling across heterogeneous compute resources, artifact versioning, and result tracking to accelerate the ML development lifecycle.
Message Queue
Worker Pattern
PostgreSQL
Redis
Docker/Sandboxing
February 8, 2026