The Question

Secure Quantitative Backtesting Platform

Design an internal system for quantitative researchers to upload and execute trading strategy code against multi-terabyte historical market datasets. The system must support parallel execution, ensure strict resource isolation and security for untrusted code, and provide reproducible performance reports. Detail the data access strategy for high-throughput historical data, the sandboxing mechanism, and the job orchestration architecture to handle long-running compute tasks.

Docker

gVisor

Redis

PostgreSQL

Parquet

Kubernetes

Python

Questions & Insights

Clarifying Questions

What is the frequency and granularity of the historical market data? (Assumed: OHLCV 1-minute bars for MVP; tick-level data is out-of-scope for the initial version to minimize storage and compute complexity).

Which programming languages must be supported? (Assumed: Python is the primary requirement due to its dominance in quantitative research).

What is the typical duration of a backtest and the expected daily volume? (Assumed: 100-500 researchers, ~1,000 backtests/day, each taking 5-30 minutes).

How large is the historical dataset? (Assumed: 10 years of US Equities data, roughly 2-5 TB in compressed Parquet format).

What level of isolation is required for "secure code execution"? (Assumed: Prevention of network access, filesystem restricted to workspace, and CPU/Memory limits to prevent "noisy neighbor" issues).

Thinking Process

Asynchronous Task Orchestration: Backtesting is long-running; the system must decouple submission from execution using a robust task queue.

Secure Sandboxing: Use containerization (Docker) with restricted runtimes (e.g., gVisor) to execute researcher-provided code safely.

Data Locality & Throughput: Avoid database bottlenecks by storing historical data in columnar formats (Parquet) on Object Storage, using local caching on workers to speed up repeated runs.

Progressive Flow:

How do we ingest and store researcher code?

How do we schedule and distribute backtest jobs across a cluster?

How do we provide market data to the code without saturating the network?

How do we securely isolate execution and collect results?

Bonus Points

Point-in-Time (PIT) Correctness: Implementation of a "Truth Engine" that ensures researchers cannot "peek into the future" by strictly controlling data visibility based on the simulated timestamp.

Vectorized vs. Event-Driven Engines: Support for both high-speed vectorized backtesting (Pandas/NumPy) for simple strategies and event-driven backtesting for high-fidelity execution simulation.

Data Versioning: Using tools like DVC or LakeFS to ensure backtest results are reproducible even if historical data is corrected/updated.

Deterministic Execution: Forcing fixed random seeds and controlling system clocks within the sandbox to ensure identical code produces identical results across different runs.

Design Breakdown

Functional Requirements

Core Use Cases:

Researchers upload strategy code (Python script/notebook).

System executes the strategy against a specified historical time range.

System generates a performance report (Sharpe ratio, Max Drawdown, Equity Curve).

Researchers can monitor the status of pending/running backtests.

Scope Control:

In-Scope: Code execution, sandboxing, historical data access, report generation.

Out-of-Scope: Real-time paper trading, live brokerage integration, ultra-low latency tick-data processing.

Non-Functional Requirements

Scale: Support parallel execution of hundreds of backtests.

Latency: System overhead (start-up time) should be < 10 seconds; backtest execution depends on code complexity.

Availability & Reliability: 99.9% availability for the submission API; job state must be persisted to survive worker failures (Fault Tolerance).

Consistency: Strategy metadata and results must be strongly consistent.

Security & Privacy: Strict isolation of researcher code; no egress traffic allowed from the sandbox.

Estimation

Traffic Estimation:

1,000 backtests/day

\approx

0.01 QPS (very low).

Peak: 50 concurrent backtests.

Storage Estimation:

Metadata (Postgres): 1,000 jobs/day * 1KB

\approx

365MB/year.

Reports/Artifacts (S3): 1,000 jobs/day * 5MB

\approx

1.8TB/year.

Historical Data: 5TB (Parquet).

Bandwidth Estimation:

If each backtest reads 1GB of data: 1,000 * 1GB = 1TB/day.

Internal bandwidth: ~100 Mbps average, but bursty.

Blueprint

Concise Summary: A queue-based worker architecture where researchers submit Python code via an API, which is then executed inside restricted Docker containers that pull historical data from a central Object Store.

Major Components:

API Gateway: Handles authentication and strategy submission.

Backtest Service: Manages job metadata and state transitions in the database.

Task Queue (Redis): Orchestrates job distribution to workers.

Backtest Workers: Execute strategy code in sandboxed containers.

Object Storage (S3): Stores historical market data (Parquet) and generated reports.

Simplicity Audit: This design avoids complex distributed streaming engines (like Flink) in favor of simple batch-oriented workers, which is sufficient for OHLCV-based research.

Architecture Decision Rationale:

Why this architecture?: The worker-queue pattern is the industry standard for long-running batch tasks. Containerization provides the best balance between security and developer flexibility (allowing custom libraries).

Functional Satisfaction: Covers the full lifecycle from upload to report.

Non-functional Satisfaction: Horizontally scalable workers handle load; Redis provides reliable task persistence; gVisor/Docker ensures security.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling: Stateless microservices deployed on Kubernetes. The Backtest Management Service scales based on request volume.

API Schema Design:

POST /v1/backtests: Submit strategy (multipart/form-data for code + JSON for params). Returns job_id.

GET /v1/backtests/{id}: Returns status (PENDING, RUNNING, COMPLETED, FAILED) and performance metrics.

GET /v1/backtests/{id}/report: Returns a signed S3 URL for the PDF/HTML report.

Resilience & Reliability:

Workers use a "heartbeat" mechanism. If a worker dies, the job is re-queued by the Management Service.

Observability: Prometheus metrics for queue depth and worker CPU/RAM utilization.

Storage

Access Pattern:

Metadata: Low frequency, high durability (Postgres).

Market Data: High throughput, read-only, sequential access (Parquet).

Database Table Design (Postgres):

jobs: id, researcher_id, status, code_s3_path, start_time, end_time, config_json.

metrics: job_id, sharpe_ratio, total_return, max_drawdown.

Technical Selection:

PostgreSQL: For relational metadata and state management.

S3 (Object Storage): For historical Parquet files and final result artifacts. Parquet is chosen for its columnar format, allowing workers to fetch only the columns (e.g., Close Price) they need.

Reliability & Recovery: Daily backups of Postgres; S3 cross-region replication for disaster recovery.

Cache

Purpose & Justification: Local SSD cache on workers reduces data egress costs from S3 and speeds up backtests that use the same historical data range.

Key-Value Schema: File-based cache using the Parquet file name/hash as the key.

Technical Selection: Simple LRU cache on the worker's local NVMe disk.

Messaging

Purpose & Decoupling: Decouples the API from the heavy compute workers.

Event / Topic Schema: backtest_jobs queue contains {job_id, priority, worker_image_tag}.

Technical Selection: Redis (using Celery or BullMQ) for low-latency task distribution.

Data Processing

Processing Model: Each worker pulls a job, downloads the strategy code, and runs a Docker container.

Processing DAG:

Worker pulls job from Redis.

Downloads Parquet data (from Local Cache or S3).

Mounts data and code into a gVisor (secure) container.

Executes code; captures stdout/stderr and result files.

Uploads results to Artifact S3 and updates Metadata DB.

Scalability: Auto-scaling worker pool (K8s Horizontal Pod Autoscaler) based on the backtest_jobs queue length.

Technical Selection: Docker with gVisor runtime for kernel-level isolation of untrusted researcher code.

Wrap Up

Advanced Topics

Trade-offs: We chose a Pull-based data model (workers pull data from S3) instead of a Database-centric model (workers query a DB). This is faster for large-scale backtesting but harder to manage for small, random-access queries.

Reliability & Failure Handling:

Timeouts: Every backtest has a hard timeout (e.g., 60 mins) enforced by the worker to prevent infinite loops.

Resource Quotas: Cgroups limit each container to specific CPU/Memory (e.g., 4 vCPU, 16GB RAM).

Security & Privacy:

Network Isolation: The worker containers are started with --net=none to prevent data exfiltration.

Distinguishing Insights:

Pre-warming Workers: To reduce startup latency, maintain a pool of "warm" containers with common quantitative libraries (Pandas, TA-Lib) pre-loaded.

Data Sharding: Market data is partitioned by Year/Month/Symbol in S3 to optimize the "Predicate Pushdown" (only downloading necessary data).