DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Scalable Double-Entry Ledger System

Design a high-integrity, double-entry financial ledger service for a global fintech platform. The system must guarantee strict ACID compliance, support at least 1,000 transactions per second, and ensure that account balances always reflect the sum of all immutable entries. Address challenges such as idempotency, high-concurrency contention on single accounts, and a multi-year audit trail for billions of records.
PostgreSQL
Redis
Kafka
ACID
Double-Entry Bookkeeping
Transactional Outbox Pattern
L7 Load Balancing
Kubernetes
Questions & Insights

Clarifying Questions

What is the required throughput (TPS)? Assuming 1,000 write TPS average and 10,000 read TPS (balance checks).
Does the system require multi-currency support? Yes, the ledger should support multiple currencies with distinct decimal precisions.
Is this a "system of record" or a "shadow ledger"? This is the primary System of Record (SoR) requiring strict ACID compliance.
What are the data retention requirements? Financial data typically requires 7+ years of immutable history for compliance.
How do we handle idempotency? Clients will provide an idempotency_key to prevent duplicate processing of the same intent.

Thinking Process

The Core Constraint: A ledger must never lose a cent and must always balance (Double-entry bookkeeping).
Key Questions for Design:
How do we ensure atomic updates? Use a relational database with ACID transactions to wrap "Balance Update" and "Entry Logging" into one unit of work.
How do we handle high-concurrency on a single account? Implement a combination of optimistic locking for low-contention accounts and a "hot-shard" mitigation strategy for high-contention accounts (e.g., system accounts).
How do we maintain immutability? Use an "Append-only" ledger entry table where records are never updated or deleted; corrections are handled via "reversal" entries.

Bonus Points

Deterministic Processing: Implementing a sequence-based engine (like LMAX Disruptor or a Raft-based state machine) to guarantee the order of transactions.
Zero-Downtime Migration: Using a "Double-write/Shadow-read" strategy when migrating schemas or database engines.
Post-Transaction Verification: An asynchronous "Integrity Checker" service that constantly sums all ledger entries for every account to ensure they match the current cached balance.
Separation of Concerns: Decoupling the "Pending/Authorization" state from the "Settled/Cleared" state to handle real-world banking flows (e.g., credit card holds).
Design Breakdown

Functional Requirements

Core Use Cases:
Create Account: Initialize a ledger account with metadata and currency.
Record Transaction: Execute a double-entry move between two or more accounts.
Get Balance: Retrieve the current cleared and pending balance of an account.
Transaction History: List paginated immutable ledger entries for an account.
Scope Control:
In-Scope: Atomic double-entry recording, idempotency, balance management, and audit trails.
Out-of-Scope: Currency exchange rate calculation (Forex), User Auth (IAM), or actual Payment Gateway integrations (Stripe/Adyen).

Non-Functional Requirements

Scale: Support 100M+ accounts and 1,000+ write TPS.
Latency: Critical path (write) < 100ms (p99); Balance read < 20ms.
Availability & Reliability: 99.999% availability; zero data loss (RPO = 0).
Consistency: Strict serializability for balance updates.
Fault Tolerance: Multi-AZ deployment for the database to survive data center failures.
Security: Data encryption at rest (AES-256) and in-transit (TLS 1.3).

Estimation

Traffic Estimation:
Average Write: 1,000 TPS.
Peak Write (e.g., Black Friday): 5,000 TPS.
Read: 10,000 TPS (mostly balance lookups).
Storage Estimation:
1 Transaction = ~500 bytes (Metadata + 2 Ledger Entries).
1k TPS 86,400s 365 days = ~31.5 Billion transactions/year.
31.5B * 500 bytes = ~15.7 TB per year.
Bandwidth Estimation:
Inbound: 1k TPS * 1KB/req = 1 MB/s.
Outbound: 10k TPS * 2KB/res = 20 MB/s.

Blueprint

Concise Summary: A robust Double-Entry Ledger utilizing a Relational Database (PostgreSQL) for transactional integrity, fronted by a stateless API service, with Redis for idempotency and Kafka for downstream event propagation.
Major Components:
API Gateway: Handles rate limiting, authentication, and request routing.
Ledger Service: Orchestrates the business logic, validates balances, and executes DB transactions.
Redis: Stores idempotency keys and provides a fast-lookup cache for account balances.
PostgreSQL: The source of truth, storing accounts, transactions, and immutable ledger entries.
Kafka: Streams transaction events to downstream services (e.g., Analytics, Notifications, Fraud).
Simplicity Audit: This design uses standard RDBMS ACID properties to solve the hardest part of the problem (consistency) without introducing complex distributed consensus logic for the MVP.
Architecture Decision Rationale:
Why this architecture?: Financial systems prioritize correctness over horizontal scalability. PostgreSQL with partitioning provides the best balance of ACID compliance and performance.
Functional Satisfaction: Double-entry is naturally modeled via SQL transactions.
Non-functional Satisfaction: High availability is achieved through DB replication and stateless service scaling.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Global DNS with latency-based routing to the nearest regional API Gateway.
Security & Perimeter: API Gateway handles JWT validation and implements a strict WAF to prevent SQL injection. Rate limiting is applied per account_id to prevent DoS on specific accounts.

Service

Topology & Scaling: Stateless Ledger Service instances deployed in Kubernetes (EKS/GKE) across 3 Availability Zones. Scaled based on CPU and request latency.
API Schema Design:
POST /v1/transactions: Executes a move.
Request: { idempotency_key, entries: [{account_id, amount, direction: "DR/CR"}], metadata }
Response: 201 Created { transaction_id, status: "SETTLED" }
Idempotency: Key stored in Redis for 24h.
GET /v1/accounts/{id}/balance: Returns current balance.
Resilience & Reliability: Implementation of the "Transactional Outbox Pattern" to ensure Kafka messages are only sent if the DB transaction succeeds.
Observability: Prometheus metrics for TPS and Latency; ELK stack for structured audit logs.

Storage

Access Pattern: Heavy writes to ledger_entries (append-only); Point lookups for accounts.
Database Table Design:
accounts: id (UUID), balance (Decimal), currency (ISO), version (BigInt)
transactions: id (UUID), idempotency_key (String), created_at (Timestamp)
ledger_entries: id (BigSerial), account_id (UUID), transaction_id (UUID), amount (Decimal), direction (Enum: DEBIT/CREDIT)
Technical Selection: PostgreSQL. Rationale: Supports SERIALIZABLE isolation level and FOR UPDATE pessimistic locking required to prevent race conditions during balance checks.
Distribution Logic: Partitioning the ledger_entries table by created_at (monthly partitions) to maintain performance as the dataset grows to billions of rows.
Reliability & Recovery: Daily full backups to S3; Continuous WAL (Write-Ahead Log) archiving for Point-in-Time Recovery (PITR).

Cache

Purpose & Justification: Redis is used to: 1. Prevent duplicate transactions (Idempotency) and 2. Speed up balance reads.
Key-Value Schema:
Key: idem:{key}, Value: transaction_id, TTL: 24h.
Key: bal:{account_id}, Value: balance_amount, TTL: 5m (Write-through or Cache-aside).
Failure Handling: If Redis is down, the system falls back to the database for idempotency checks (using unique constraints) and balance lookups.

Messaging

Purpose & Decoupling: Kafka decouples the critical ledger write path from non-critical downstream side effects (Notifications, Reporting).
Event Schema: TransactionSettled event containing transaction details and account balance updates.
Throughput & Partitioning: Partitioned by account_id to ensure that all events for a specific account are processed in order by consumers.
Technical Selection: Kafka. Rationale: High durability and replayability for audit reconstructions.

Infrastructure (Optional)

Distributed Coordination: Not explicitly needed as PostgreSQL handles locking.
Secret Management: AWS Secrets Manager for DB credentials and API keys.
Wrap Up

Advanced Topics

Trade-offs: We chose ACID/Consistency over horizontal "NoSQL" write scalability. If 1k TPS becomes 100k TPS, we would need to transition to a sharded DB or a distributed ledger engine (like TigerBeetle).
Reliability & Failure Handling:
Pessimistic Locking: When updating a balance, we use SELECT ... FOR UPDATE. This prevents two transactions from updating the same account balance simultaneously.
Deadlock Avoidance: Always lock account IDs in a sorted order (e.g., lower ID first) to prevent circular wait deadlocks.
Security & Privacy:
Immutability: The ledger_entries table is "Insert-only". Any human error or fraud correction must result in a new entry, never a deletion of the original record.
Distinguishing Insights:
Decimal Handling: Never use Floats for money. Use Decimal or BigInt (storing values in the smallest unit, e.g., cents) to avoid rounding errors.
Hot Account Problem: If a "System Account" (e.g., a fee collection account) is involved in every transaction, it becomes a lock bottleneck. Strategy: Use "sharded counters" or "deferred updates" for these specific high-traffic accounts.