The Question
SQLNth Highest Distinct Salary
Given an 'Employee' table with 'id' (Primary Key) and 'salary' (Integer), write a SQL query to retrieve the Nth highest distinct salary. If there are fewer than N distinct salaries in the table, the query should return NULL. The solution should be flexible enough to handle any integer input N.
PostgreSQL
Scalar Subquery
OFFSET
LIMIT
DISTINCT
Questions & Insights
Clarifying Questions
How is $n$ provided? In PostgreSQL, this logic is typically encapsulated in a user-defined function (UDF). I will assume n is passed as an integer parameter.
How should ties be handled? The requirement specifies the "n-th highest distinct salary." This implies that if multiple employees share the same salary, they should be treated as a single rank (equivalent to using
DENSE_RANK or DISTINCT).What is the behavior for $n=0$ or negative values? Per standard SQL behavior for offsets, such values are generally invalid or result in an empty set. I will assume n \ge 1.
Data Model Assumptions:
Employee table:id (INT): Primary Key.salary (INT): The value to be ranked.Grain: One row per employee.
Data Quality: The
salary column might contain NULLs; standard SQL ordering treats NULLs as the highest or lowest depending on the dialect. For this problem, we assume only non-null salaries are relevant for ranking.Thinking Process
De-duplication: Since we need the n-th distinct salary, the first step is to isolate unique salary values.
Ordering: We need to sort these distinct salaries in descending order (highest to lowest).
Positioning: We need to skip the first n-1 salaries and take the next one.
Handling "Fewer than $n$" results:
A standard
LIMIT 1 OFFSET n-1 query will return "no rows" if the offset exceeds the available data. However, the requirement asks to return
NULL. In SQL, a Scalar Subquery (a subquery in the
SELECT clause) automatically returns NULL if the subquery result set is empty. This is the most idiomatic way to handle the "empty to NULL" conversion in PostgreSQL.Implementation Breakdown
Problem Set
Goal: Find the n-th highest distinct salary.
Input:n (integer).
Constraints:
Return
NULL if count of distinct salaries < n.Must handle duplicate salary values correctly.
Edge Cases:
Table is empty.
Table has fewer than n rows.
Multiple employees have the same salary (should only count as one rank).
n=1 (The maximum salary).
Approach
Technologies: PostgreSQL 15+.
Functions:
DISTINCT, ORDER BY, LIMIT, OFFSET.Strategy: I will use a Scalar Subquery wrapped in a
CREATE FUNCTION block to provide a reusable solution that meets the requirement of returning NULL when n is out of bounds.Computational Cost:O(D \log D) where D is the number of distinct salaries (due to sorting). If a B-tree index exists on
salary, the database can perform an index scan, potentially reducing the cost to O(n).Implementation
Alternatively, if you are strictly writing a query for a platform like LeetCode or a report:
Wrap Up
Advanced Topics
Indexing: To optimize this query, a B-tree index on the
salary column is essential. PostgreSQL can perform a "Backward Index Scan" to find the distinct values efficiently without sorting the entire table in memory.Window Functions vs. Offset: While
DENSE_RANK() is powerful, LIMIT/OFFSET is generally more performant for finding a single specific rank because it can "short-circuit" the operation once the specific row is found, whereas window functions often materialize the entire ranking set.NULL Handling: If the
salary column contains NULLs, they appear at the "top" by default in a DESC sort in PostgreSQL. To ensure we only count actual numbers, we should use WHERE salary IS NOT NULL or ORDER BY salary DESC NULLS LAST.Query Plan: In a large dataset, if
n is very large (e.g., the 10,000th highest salary), an OFFSET approach becomes slow because the engine must still scan over the first 9,999 records.