The Question
SQLCorrecting Positional Data Swaps
Given a table of orders with sequential IDs, identify and correct a data entry error where the 'item' values for each pair of adjacent rows (1 and 2, 3 and 4, etc.) have been swapped. If the table contains an odd number of records, the final record should remain unchanged. Provide a SQL query that returns the original order IDs with their correctly assigned items.
PostgreSQL
Window Function
LEAD
LAG
COALESCE
CASE Expression
Questions & Insights
Clarifying Questions
Are `order_id` values sequential and gapless?
Assumption: The problem implies
order_id acts as the sequence identifier (1, 2, 3...). If there were gaps (e.g., 1, 3, 4), the logic would shift to using ROW_NUMBER() to create a dense sequence before swapping.How should we handle the "last item" if the total count is odd?
Assumption: The prompt explicitly states if the last item has an odd ID, it remains unchanged. This implies we need to check both the parity of the current ID and whether a subsequent record exists.
Can an `order_id` be NULL?
Assumption:
order_id is the Primary Key and is NOT NULL.What is the data volume?
Assumption: The dataset fits in memory for standard window function processing, but we should aim for a single-pass scan if possible.
Data Model:
Table:
orders (Fact Table)Primary Key:
order_id (Integer)Relationships: Independent records, but logically paired by sequence.
Thinking Process
The core of the problem is a transposition of values based on positional parity.
Categorize Rows: Every row is either "Odd" or "Even".
Mapping Logic:
If an
order_id is Odd (1, 3, 5...): It needs to fetch the item from the next row (order_id + 1).If an
order_id is Even (2, 4, 6...): It needs to fetch the item from the previous row (order_id - 1).Boundary Handling:
For an Odd
order_id, if there is no "next" row (meaning it's the last row in an odd-numbered set), it must retain its original value.SQL Strategy:
Using Window Functions like
LEAD() and LAG() is more idiomatic and performant in PostgreSQL than performing a self-join.LEAD(item) OVER (ORDER BY order_id) grabs the next item.LAG(item) OVER (ORDER BY order_id) grabs the previous item.We use a
CASE statement to decide which window function to invoke based on the result of order_id % 2.Implementation Breakdown
Problem Set
Goal: Swap items between adjacent rows (1 with 2, 3 with 4).
Constraints:
PostgreSQL syntax.
Correct handling of odd-numbered total records (last record stays put).
Maintain the original
order_id sequence in the output.Approach
Window Functions:
LEAD() to look ahead and LAG() to look back.Mathematical Operators: Modulo (
%) to determine parity.Conditional Logic:
CASE expressions to handle the swapping logic and the edge case for the final odd row.Execution Strategy: A single sequential scan with window function support, resulting in O(N \log N) complexity due to the required sort for the window.
Implementation
Wrap Up
Advanced Topics
Scalability: If the table is massive, ensure there is a B-Tree index on
order_id. This allows the PostgreSQL optimizer to avoid a physical sort and instead perform an Index Scan, significantly speeding up the window function calculations.Alternative - Arithmetic Approach: One could also solve this by manipulating the
order_id itself (swapping IDs and then sorting), but that is often riskier if order_id is a sequence used as a Foreign Key elsewhere. The logic above preserves the ID-Item relationship context but swaps the "payload".Parallelism: PostgreSQL can parallelize window functions in recent versions. For very large datasets, the query planner might use a
Parallel Index Scan to distribute the workload of the LEAD/LAG calculation.