The Question

Financial Ledger and Account Management System

Design a robust and scalable financial account management system capable of supporting millions of users and high-concurrency fund transfers. The system must guarantee ACID compliance, prevent double-spending, and maintain a permanent audit trail using double-entry bookkeeping principles. Address how the system handles idempotency during network failures, manages data consistency across service boundaries, and scales the storage layer as the transaction history grows into the billions.

PostgreSQL

Redis

Kafka

CDC

JWT

Kubernetes

gRPC

Prometheus

Questions & Insights

Clarifying Questions

Scale & Traffic: What is the expected scale in terms of Daily Active Users (DAU) and Transaction Per Second (TPS)?

Assumption: 1M DAU, 500 average TPS, 2,000 peak TPS.

Consistency Requirements: Is eventual consistency acceptable for account balances?

Assumption: No. Financial data requires strict ACID compliance and strong consistency for balance updates to prevent double-spending.

Core Features: What are the primary operations for the MVP?

Assumption: Account creation, balance retrieval, and internal fund transfers (Account A to Account B).

Audit & Compliance: Do we need to store a history of all changes for regulatory reasons?

Assumption: Yes, a persistent audit trail (immutable ledger) is required.

External Integration: Should we support external bank transfers (ACH/Wire) in the MVP?

Assumption: No, internal transfers only for the MVP.

Thinking Process

Core Challenge: Ensuring data integrity and atomicity during fund transfers where one account is debited and another is credited simultaneously.

Key Progression:

How do we represent financial state? (Double-entry bookkeeping vs. simple balance column).

How do we ensure atomicity? (Database transactions and row-level locking).

How do we handle high-concurrency hotspots? (Sharding and optimistic locking).

How do we guarantee idempotency? (Unique request IDs and idempotency keys).

Bonus Points

Double-Entry Bookkeeping: Designing the schema so that every "transfer" consists of at least two ledger entries (a debit and a credit) that sum to zero, ensuring the system is always auditable.

Idempotency Framework: Implementing a standardized idempotency layer at the API Gateway or Service level to prevent duplicate transactions during retries.

Database Partitioning: Strategy for sharding by account_id to scale horizontally while keeping related transaction records on the same shard for atomic local transactions.

Change Data Capture (CDC): Using CDC (e.g., Debezium) to stream ledger changes to downstream services (analytics, notifications) without impacting the performance of the primary transaction database.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can create and manage multiple financial accounts.

Users can check real-time account balances.

Users can transfer funds between internal accounts.

Users can view a history of transactions.

Scope Control:

In-scope: Internal ledger, basic auth, transaction history, balance management.

Out-of-scope: External payment gateways, currency conversion (FX), physical card management, investment portfolios.

Non-Functional Requirements

Scale: Support 10M+ total accounts and up to 2,000 peak TPS.

Latency: Balance checks < 100ms; Fund transfers < 500ms.

Availability & Reliability: 99.99% availability; zero data loss (RPO=0).

Consistency: Strong consistency for all financial mutations.

Fault Tolerance: Automatic failover for database primaries; retry mechanisms for transient failures.

Security & Privacy: Encryption at rest/transit; PCI-DSS compliance-ready architecture; strict RBAC.

Estimation

Traffic Estimation:

1M DAU * 5 transactions/day = 5M transactions/day.

Average TPS: ~60 TPS.

Peak TPS (10x): 600 - 2,000 TPS.

Storage Estimation:

1 Transaction record: ~200 bytes.

5M transactions/day * 365 days = 1.8B transactions/year.

Total Storage: 1.8B * 200 bytes ≈ 360 GB/year.

Bandwidth Estimation:

Incoming: 2,000 TPS * 1KB/req = 2 MB/s.

Outgoing: 2,000 TPS * 2KB/res = 4 MB/s.

Blueprint

Concise Summary: A microservices-based architecture centered around a strictly ACID-compliant relational database utilizing double-entry bookkeeping.

Major Components:

API Gateway: Handles authentication, rate limiting, and request routing.

Account Service: Manages user profiles and account metadata.

Transaction Service: Executes fund transfers using atomic DB transactions and enforces idempotency.

Ledger Database: The source of truth for all balances and transaction logs.

Simplicity Audit: This design avoids complex distributed transaction coordinators (like 2PC) by keeping related account data on the same database shard for the MVP.

Architecture Decision Rationale:

Why this architecture?: Relational databases (PostgreSQL) are the industry standard for financial ledgers due to mature ACID support.

Functional Satisfaction: Covers all core flows from account creation to auditable transfers.

Non-functional Satisfaction: Scalable via sharding; highly available through synchronous replication; secure via centralized gateway.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Not critical for the MVP as content is mostly dynamic API data.

Security & Perimeter:

API Gateway: Performs JWT validation and extracts user_id.

Rate Limiting: Applied per user_id (e.g., 10 transfer requests per minute) to prevent abuse and DoS.

Service

Topology & Scaling: Stateless microservices deployed across multiple Availability Zones (AZs) using K8s. Scaling is triggered by CPU (>60%) or Request Latency.

API Schema Design:

POST /v1/transfers:

Request: { "from_account": "uuid", "to_account": "uuid", "amount": "decimal", "idempotency_key": "uuid" }

Idempotency: Required header to prevent double-billing on retries.

GET /v1/accounts/{id}/balance: Returns current cleared balance.

Resilience & Reliability:

Retries: Client-side retries with exponential backoff for 5xx errors.

Circuit Breakers: Implemented in the Gateway to fail fast if the Transaction Service is degraded.

Storage

Access Pattern: High read/write ratio for balances; write-heavy for transaction logs. Strong consistency required for update operations.

Database Table Design:

Accounts: id (PK), user_id, type, status, created_at.

Balances: account_id (PK), amount, currency, version (for optimistic locking).

Journal Entries (Ledger): id (PK), transaction_id, account_id, type (DEBIT/CREDIT), amount, created_at.

Technical Selection: PostgreSQL.

Rationale: Support for BEGIN...COMMIT blocks ensures that a transfer (Debit A + Credit B + Log Entry) either all succeeds or all fails.

Distribution Logic: Sharding by user_id or account_id. For the MVP, a single large RDS instance with Read Replicas is sufficient.

Cache

Purpose & Justification: Redis is used exclusively for Idempotency Management.

Key-Value Schema: idempotency_key:{uuid} -> {status: "PENDING|COMPLETED", response_payload: "{...}"}.

TTL: 24 hours. After a day, the likelihood of a client retry for the same request is negligible.

Failure Handling: If Redis is down, the Transaction Service can fallback to checking the Ledger DB for the same key, ensuring safety at the cost of higher latency.

Messaging

Purpose & Decoupling: Kafka is used to decouple the critical transaction path from non-critical side effects (Push notifications, Audit indexing).

Event Schema: TransactionCreatedEvent containing tx_id, amount, and timestamp.

Technical Selection: Kafka for its durability and replayability, which is essential for re-generating audit logs if downstream systems fail.

Infrastructure (Optional)

Observability: Prometheus for metrics (Transaction success rate), ELK for structured logging, and Jaeger for tracing the lifecycle of a transfer across services.

Security: Database credentials stored in AWS Secrets Manager; mTLS for service-to-service communication.

Wrap Up

Advanced Topics

Trade-offs: We chose Strong Consistency over high availability (CP in CAP). In financial systems, it is better to return an error than to risk an incorrect balance.

Reliability: We use Double-Entry Bookkeeping. We never just UPDATE balance SET amount = amount - X. We always insert rows into a journal table and use triggers or application logic to update the balances table within the same transaction.

Bottleneck Analysis: The primary bottleneck will be row-level locking on the balances table for very active accounts.

Optimization: Implement "Hot Account" logic where updates are batched or sharded further, though this is rarely needed for an MVP.

Distinguishing Insights: To handle Race Conditions, we utilize SELECT FOR UPDATE on the account rows during a transfer. This ensures that while Account A is being debited by one process, no other process can modify its balance until the first one commits.