The Question
DesignMortgage Application & Document Management System
Design a high-security, workflow-driven application for mortgage agents to manage loan applications. The system must support document uploads (bank statements, tax returns), status tracking across a multi-stage lifecycle, and agent-specific dashboards. Key constraints include handling highly sensitive PII, ensuring strict auditability of status changes, and maintaining system responsiveness under high-volume document processing. Address how you would scale to 10,000 agents and ensure 99.9% availability while complying with financial data regulations.
PostgreSQL
Redis
S3
SQS
JWT
AES-256
TLS 1.3
HashiCorp Vault
Docker
Questions & Insights
Clarifying Questions
Who are the primary users and what is the expected scale?
Assumption: The app targets mortgage brokers and loan officers. MVP scale: 10,000 Daily Active Users (DAU), managing ~100,000 active mortgage applications.
What is the core "happy path" for the MVP?
Assumption: Lead capture, mortgage application workflow management, secure document upload/storage, and basic status tracking.
How sensitive is the data and what are the compliance requirements?
Assumption: Extremely sensitive (PII, financial records). Must adhere to SOC2/GLBA-like standards. Data must be encrypted at rest and in transit.
Do we need to integrate with external Credit Bureaus or Loan Origination Systems (LOS)?
Assumption: For the MVP, we will use a webhook-based mock/facade for external integrations to focus on the agent's internal workflow.
Is real-time collaboration required?
Assumption: Simple push notifications for status updates are sufficient for the MVP; no real-time co-editing of forms.
Thinking Process
The State Machine Bottleneck: A mortgage is a complex long-running process (Lead -> Prequal -> Underwriting -> Closing). The core challenge is maintaining a robust, auditable state machine.
Document Integrity: Handling large PDF/Image uploads reliably with virus scanning and secure access links.
Integration Resilience: How do we handle slow or failing 3rd party financial APIs without locking our UI?
Security First: Implementing a robust RBAC (Role-Based Access Control) to ensure agents only see their assigned leads.
Bonus Points
Zero-Knowledge Proofs / Field-Level Encryption: Using an envelope encryption strategy where PII (SSN, Income) is encrypted with a unique key per user/application, stored in a Secure Vault (e.g., HashiCorp Vault).
Event Sourcing for Audit Trails: Storing every status change as an immutable event to provide a 100% accurate audit log required for financial compliance.
Optimistic UI with Background Sync: Ensuring the agent app feels snappy on mobile even with spotty cellular data during site visits.
Design Breakdown
Functional Requirements
Core Use Cases:
Agent creates and manages "Leads" and "Applications".
Securely upload and preview documents (Bank statements, Paystubs).
Track application status through a kanban-style pipeline.
Basic mortgage calculator for quick estimations.
Scope Control:
In-scope: Application workflow, Document management, Agent dashboard, Notifications.
Out-of-scope: Direct credit pulling (external API), Escrow management, Integrated chat, e-Signatures (use external redirects).
Non-Functional Requirements
Scale: Support 10k DAU and up to 10M documents.
Latency: P99 < 300ms for dashboard loads; < 1s for document metadata retrieval.
Availability & Reliability: 99.9% uptime. Data durability is paramount (no lost applications).
Consistency: Strong consistency for application status updates to prevent "double-submission" errors.
Security & Privacy: AES-256 encryption at rest, TLS 1.3 in transit, strict RBAC.
Estimation
Traffic Estimation:
10k DAU * 50 requests/day = 500k Daily Requests.
Average QPS: ~6 requests/sec. Peak QPS: ~50-100 requests/sec.
Storage Estimation:
100k Applications 20 docs/app 5MB/doc = 10 TB storage.
Database: 100k applications * 10KB/record = 1 GB (very small, RDBMS is perfect).
Bandwidth Estimation:
Uploads: 1000 new apps/day * 100MB docs = 100 GB/day (~10 Mbps avg).
Blueprint
Concise Summary: A modular microservices architecture utilizing a relational database for transactional integrity and an object store for document management.
Major Components:
Application Service: Manages the mortgage workflow state machine and lead metadata.
Document Service: Handles secure uploads to S3, virus scanning, and generating pre-signed URLs.
Notification Service: Sends asynchronous alerts (Email/SMS) via a message queue.
Simplicity Audit: This design avoids complex distributed locks or NoSQL clusters, favoring a reliable PostgreSQL instance and simple SQS queues to keep operational overhead low for the MVP.
Architecture Decision Rationale:
Relational DB: Chosen for ACID compliance; mortgage data is highly structured and requires complex joins (Leads, Co-borrowers, Assets).
S3 with Pre-signed URLs: Offloads heavy lifting of file transfers from the app servers, improving scalability and security.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing: CloudFront for static assets (web dashboard).
Security & Perimeter:
API Gateway: Handles JWT validation, rate limiting (1000 req/min per IP), and SSL termination.
WAF: Protects against SQL injection and common exploits.
Service
Topology & Scaling: Stateless containers deployed across multiple Availability Zones (AZs). Scaling triggered by CPU (>60%) or Request Count.
API Schema Design:
POST /v1/applications: Creates a new mortgage application. (REST, Idempotency-Key required).GET /v1/applications/{id}: Returns full application state.POST /v1/documents/upload-link: Returns a pre-signed S3 URL for secure upload.Resilience & Reliability: Exponential backoff for all internal service calls. Circuit breakers on the Notification service to prevent blocking the Application service.
Cache
Purpose & Justification: Session management and caching application lookups to reduce DB load.
Key-Value Schema:
session:{token} -> {user_id, role} (TTL 24h). app_cache:{app_id} -> {status}.Technical Selection: Redis.
Messaging
Purpose & Decoupling: Asynchronous processing for non-blocking tasks like sending confirmation emails and triggering document virus scans.
Event / Topic Schema:
application.status.changed, document.uploaded.Technical Selection: AWS SQS or RabbitMQ. Chosen for its "at-least-once" delivery and simplicity in a cloud-native environment.
Infrastructure (Optional)
Observability: Prometheus for metrics (Latency, Error rates), ELK stack for structured logging.
Security: HashiCorp Vault for managing DB credentials and encryption keys.
Wrap Up
Advanced Topics
Trade-offs: We chose Strong Consistency (RDBMS) over Availability (NoSQL) because a mortgage application state cannot be eventually consistent (e.g., showing a loan as "Approved" when it was "Rejected").
Reliability: Multi-AZ deployment ensures that the failure of one data center doesn't take down the service.
Security: Implementation of Pre-signed URLs ensures that document data never flows through our application server, reducing the attack surface for PII leaks.
Optimizations: For the "Leads" dashboard, we use a composite index on
(agent_id, status, created_at) to ensure millisecond response times even as the database grows to millions of rows.