DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Digital Wallet System Design

Design a highly reliable digital wallet system similar to Venmo or PayPal. The system must handle high-concurrency P2P transfers, deposits, and withdrawals while maintaining strict financial consistency. Focus on the core accounting ledger, idempotency mechanisms to prevent double-spending, and how the system remains resilient during external bank gateway failures. Assume a scale of 10M daily active users and 10,000 peak transactions per second.
PostgreSQL
Redis
Kafka
mTLS
Saga Pattern
ACID
Double-Entry Accounting
Kubernetes
API Gateway
Questions & Insights

Clarifying Questions

Scale & Performance: What is the target scale in terms of Daily Active Users (DAU) and Peak Transactions Per Second (TPS)?
Consistency vs. Availability: In financial systems, consistency is usually paramount. Is there a specific SLA for transaction finality (e.g., must be synchronous or can it be eventually consistent)?
Functional Scope: Are we supporting peer-to-peer (P2P) transfers only, or also merchant payments and cross-border currency exchange?
Regulatory Requirements: Do we need to handle KYC (Know Your Customer), AML (Anti-Money Laundering) checks, and complex auditing within this MVP?
Assumptions for this design:
Scale: 100M total users, 10M DAU, Peak TPS of 10,000.
Geography: Single country for MVP to avoid complex FX (Forex) logic.
Consistency: Strong consistency for account balances is mandatory.
Scope: Link bank accounts, Deposit/Withdraw, and P2P transfers.

Thinking Process

The core challenge of a digital wallet is ensuring Atomic, Consistent, Isolated, and Durable (ACID) transactions at scale while preventing double-spending.
How do we guarantee no money is ever lost or created? We implement a strict Double-Entry Accounting system where every movement of funds is recorded as a debit from one account and a credit to another.
How do we handle high-concurrency balance updates? We use a dedicated Ledger Service with a relational database (RDBMS) to handle row-level locking or optimistic concurrency control, ensuring balances never go negative.
How do we prevent duplicate transactions? We enforce Idempotency at the API and Service layers using unique request_ids or deduplication_keys.
How do we handle external failures (e.g., Bank Gateway downtime)? We use a State Machine and asynchronous processing (Sagas) to manage long-running transactions between our system and external banks.

Bonus Points

Immutable Ledger: Instead of just updating a balance column, we store a series of immutable transaction logs. The balance is a derived view (materialized snapshot) to ensure 100% auditability.
Distributed Lock Management: Using Redis Redlock or Zookeeper for transaction orchestration to prevent race conditions before hitting the database.
Database Sharding by Account ID: To scale the Ledger beyond a single RDBMS instance, we shard data based on account_id, ensuring a single transaction (P2P) ideally stays within one or two shards.
Hot-Account Mitigation: For "celebrity" accounts or platform treasury accounts that handle millions of transactions, we implement "sharded counters" or "buffered writes" to prevent lock contention.
Design Breakdown

Functional Requirements

Core Use Cases:
User Onboarding & Wallet Creation.
Link Bank Account (via Plaid/Stripe-like integration).
Deposit funds from Bank to Wallet.
Withdraw funds from Wallet to Bank.
P2P Transfer (User A to User B).
View Transaction History and Current Balance.
Scope Control:
In-scope: Transaction processing, Ledger management, Idempotency.
Out-of-scope: Advanced fraud detection (ML models), Currency conversion, Physical card issuance.

Non-Functional Requirements

Scale: Support 10k Peak TPS and 100M users.
Latency: P2P transfers should complete (or be accepted) within < 200ms.
Availability & Reliability: 99.99% availability (critical for financial trust).
Consistency: Strong consistency for the Ledger; eventual consistency for transaction history search indexes.
Security & Privacy: PCI-DSS compliance, encryption at rest/transit, and strict RBAC.

Estimation

Traffic:
10M DAU * 2 transactions/day = 20M transactions/day.
Average QPS = 20M / 86400 ≈ 230 TPS.
Peak QPS (10x average) = 2,300 TPS (designing for 10k to be safe).
Storage:
Each transaction record ≈ 500 bytes.
20M transactions/day * 365 days ≈ 7.3B transactions/year.
Total Storage ≈ 7.3B * 500B ≈ 3.6 TB/year.
Bandwidth:
2,300 TPS * 1KB/request ≈ 2.3 MB/s (easily handled by standard networking).

Blueprint

The architecture centers around a Ledger Service acting as the source of truth, utilizing a relational database for ACID guarantees. A Transaction Service orchestrates the flow between internal ledger updates and external bank gateways.
API Gateway: Handles authentication, rate limiting, and request routing.
Wallet Service: Manages user profiles, linked bank metadata, and account status.
Transaction Service: Orchestrates P2P, Deposits, and Withdrawals; manages the transaction state machine.
Ledger Service: The core accounting engine. Executes double-entry book-keeping.
Bank Gateway Adapter: Abstracts third-party bank APIs (e.g., Plaid, ACH).
Simplicity Audit: This design avoids complex distributed microservices where possible by centralizing the "Truth" in the Ledger RDBMS, which is the most reliable way to handle money in an MVP.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling: Services are stateless and deployed in multi-AZ Kubernetes clusters. Scaling is based on CPU and request latency.
API Schema Design:
POST /v1/transfers: Initiates P2P transfer.
Request: { sender_id, receiver_id, amount, currency, idempotency_key }
Response: 202 Accepted with transaction_id.
Idempotency: All write endpoints require an idempotency_key stored in Redis for 24 hours to prevent duplicate processing.
Resilience: Transaction Service uses a State Machine (Pending -> Processing -> Success/Failed). If a step fails, it triggers a compensating transaction (Saga pattern) or retries with exponential backoff.

Storage

Access Pattern:
Write-heavy for the Ledger (every transaction is a write).
Read-heavy for History/Balance inquiry.
Database Table Design (Ledger DB):
accounts: account_id (PK), user_id, balance, version (for optimistic locking), status.
ledger_entries: entry_id (PK), transaction_id, account_id, type (DEBIT/CREDIT), amount, created_at.
Technical Selection: PostgreSQL.
Rationale: Support for ACID, complex joins for auditing, and mature ecosystem for financial applications.
Distribution Logic: Sharding by account_id. For P2P transfers across shards, use a 2-Phase Commit (2PC) within the Ledger Service or a Saga orchestrated by the Transaction Service.

Cache

Purpose: Reducing read load on the Ledger DB for balance inquiries and storing idempotency keys.
Key-Value Schema:
balance:{account_id} -> Decimal Value. Updated via Write-through or Cache-aside.
idem:{key} -> transaction_id. TTL: 24h.
Technical Selection: Redis. High performance and supports atomic increments/decrements.

Messaging

Purpose: Decoupling the core transaction flow from auxiliary tasks (e.g., Push notifications, updating Search/History service).
Event Schema: TransactionCompletedEvent: {tx_id, sender_id, receiver_id, amount, timestamp}.
Technical Selection: Kafka.
Rationale: High throughput and message replayability for recovery/auditing.

Infrastructure (Optional)

Observability: Prometheus for metrics (TPS, Error rates) and Jaeger for distributed tracing to monitor transaction flow across services.
Security: HashiCorp Vault for managing bank API keys and sensitive user data encryption keys.
Wrap Up

Advanced Topics

Trade-offs: We chose Strong Consistency (RDBMS) over High Availability (NoSQL). In a wallet, a system being "down" for 1 minute is better than a system "miscounting" $1M.
Reliability: To handle the "Thundering Herd" on balance updates, we use Redis for preliminary balance checks before hitting the DB.
Bottleneck Analysis: The Ledger DB is the primary bottleneck. Vertical scaling first, then horizontal sharding by account_id is the roadmap to 100k+ TPS.
Security: All internal service communication uses mTLS. The Ledger DB is in a private subnet with no public access.
Distinguishing Insight: Shadow Ledger. During the migration to a sharded ledger or new version, run a "Shadow Ledger" in parallel that processes the same transactions but doesn't affect the user balance. Compare results to ensure 100% accuracy before cutover.