PostgreSQL
Cheat Sheet
Prime Use Case
Use PostgreSQL when data integrity is paramount, the schema involves complex relationships/joins, or when you need a 'Swiss Army Knife' database that handles relational, JSON, and geospatial data in one place.
Critical Tradeoffs
- Strong consistency and rich features vs. write-path latency
- Vertical scaling simplicity vs. horizontal scaling complexity
- Process-based architecture vs. thread-based memory efficiency
Killer Senior Insight
PostgreSQL is effectively a 'database construction kit.' Through its extension framework (Citus for sharding, PostGIS for GIS, pgvector for AI), it can morph into almost any specialized data store while retaining the reliability of a 30-year-old core.
Recognition
Common Interview Phrases
Common Scenarios
- Financial ledgers and banking core systems.
- Complex E-commerce catalogs with deep category hierarchies.
- Metadata storage for large-scale distributed file systems.
- Geospatial applications requiring proximity and polygon logic.
Anti-patterns to Avoid
- Extremely high-volume, short-lived ephemeral data (e.g., clickstream logs) where NoSQL is cheaper.
- Storing massive binary objects (BLOBs) directly in the DB instead of S3.
- Simple key-value workloads that could be handled by Redis or DynamoDB with lower latency.
The Problem
The Fundamental Issue
Ensuring absolute data correctness and complex queryability in a concurrent environment without sacrificing the ability to evolve the schema.
What breaks without it
Data corruption due to partial writes or race conditions.
Application-level complexity from trying to implement joins or constraints in code.
Inability to perform complex analytical queries on operational data.
Why alternatives fail
NoSQL databases often lack cross-document ACID transactions, leading to 'phantom' data states.
MySQL (historically) had a less sophisticated query optimizer for complex, deeply nested joins.
In-memory stores lack the durability guarantees required for 'source of truth' data.
Mental Model
The Intuition
Imagine a master librarian who never erases anything. When a book is updated, they don't use an eraser; they write a new version of the page and keep the old one. This allows multiple people to read the 'old' version while someone is writing the 'new' one. Periodically, a cleaning crew (Vacuum) removes the pages no one is looking at anymore.
Key Mechanics
MVCC (Multi-Version Concurrency Control): Allows readers and writers to operate simultaneously without locking.
Write-Ahead Logging (WAL): Ensures durability by logging changes to disk before they are applied to data files.
B-Tree & GIN Indexes: Optimized structures for fast data retrieval and full-text search.
Process-per-connection: Each client connection is handled by a separate OS process, providing isolation but higher memory overhead.
Framework
When it's the best choice
- When the schema is relational and requires complex constraints (Foreign Keys, Check constraints).
- When you need to perform 'Join' operations across many tables frequently.
- When you need a reliable secondary index on JSON fields (JSONB).
When to avoid
- When write throughput exceeds the capacity of a single large instance (unless using Citus).
- When the system requires global, multi-region active-active writes with low latency (consider CockroachDB or Spanner).
Fast Heuristics
Tradeoffs
Strengths
- Feature richness (Window functions, CTEs, JSONB, Custom Types).
- Massive ecosystem and community support.
- Extremely reliable and battle-tested under heavy load.
- Advanced query optimizer that handles complex execution plans.
Weaknesses
- Connection overhead: Each connection consumes ~10MB of RAM; requires a pooler like PgBouncer.
- Vacuuming overhead: MVCC creates 'dead tuples' that must be cleaned, potentially causing IO spikes.
- Horizontal scaling is not native; requires manual sharding or third-party extensions.
Alternatives
When it wins
Simpler replication setup and slightly better performance for very simple, high-concurrency read workloads.
Key Difference
MySQL uses a thread-per-connection model and handles MVCC differently (Undo logs vs. Postgres's tuple versioning).
When it wins
When you need a 'Global' database with automatic sharding and multi-region resilience while keeping SQL.
Key Difference
Built on a distributed KV store (RocksDB/Pebble) rather than a monolithic storage engine.
When it wins
When data is truly polymorphic and lacks a stable schema, and joins are never required.
Key Difference
Document-oriented, schema-less by default, and historically optimized for horizontal scale over complex relations.
Execution
Must-hit talking points
- Mention MVCC and how it enables non-blocking reads.
- Discuss the necessity of PgBouncer for scaling connections in microservices.
- Explain the difference between JSON and JSONB (binary storage and indexing).
- Highlight the 'Write-Ahead Log' (WAL) for crash recovery and replication.
Anticipate follow-ups
- Q:How do you handle 'Vacuum' bloat in a high-update environment?
- Q:What is the difference between Logical and Physical replication?
- Q:How would you implement sharding for a Postgres database at 100TB scale?
- Q:Explain the performance implications of using UUIDs as Primary Keys vs. BigInt.
Red Flags
Ignoring the 'Autovacuum' configuration.
Why it fails: Leads to 'Table Bloat' where the database takes up massive disk space and performance slows to a crawl as it scans dead rows.
Not using a connection pooler (PgBouncer) with many microservices.
Why it fails: The process-per-connection model will exhaust server memory and CPU due to context switching and fork overhead.
Over-indexing every column.
Why it fails: Every index slows down 'Insert' and 'Update' operations because the index must be updated synchronously within the transaction.