DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
SQL

Top Grossing Products by Category

Given a product_spend table documenting customer transactions, write a query to retrieve the top two highest-grossing products within each category for the calendar year 2022. Gross spend is defined as the sum of the spend column for a given product. Your output should include the category name, the product name, and the total calculated spend, ordered by category and then by the highest spend.
Snowflake
Window Function
QUALIFY Clause
CTE
Aggregate Function
Questions & Insights

Clarifying Questions

How should ties be handled? If two products have the identical total spend, should we return both (possibly resulting in more than 2 products per category) or limit it strictly to 2? Assumption: We will use `RANK()` to handle ties logically, but `ROW_NUMBER()` is often used if a strict count is required. I will use `RANK()` to ensure we don't miss products with identical performance.
What is the definition of "highest-grossing"? Does this include returned items or cancelled orders? Assumption: The `spend` column represents the final net revenue per transaction, and all rows in the table for 2022 are valid transactions.
What is the time grain? Is the transaction_date in UTC? Assumption: The date is stored as a timestamp, and we will filter based on the calendar year 2022 using Snowflake's optimized date functions.
Schema Assumptions:
product_spend: Fact table (Event-based).
category & product: Dimensional attributes (strings).
spend: Numeric/Decimal for precision.
user_id: Foreign key to a user dimension (not needed for this specific aggregation).

Thinking Process

Filter & Aggregate: First, we need to narrow the dataset to the year 2022. We then need to calculate the SUM(spend) grouped by both category and product. This creates our "Total Gross" per product.
Ranking: Within each category, we need to rank products based on their total spend in descending order.
Snowflake Optimization (QUALIFY): In many SQL dialects, you'd need a CTE or a subquery to filter on a window function (like WHERE rank <= 2). However, Snowflake supports the QUALIFY clause, which allows filtering the results of window functions directly in the main query block, making the code cleaner and more efficient.
Date Handling: Using EXTRACT(YEAR FROM transaction_date) or YEAR(transaction_date) is standard and readable in Snowflake.
Implementation Breakdown

Problem Set

Goal: Identify the top 2 products by total spend for every category in 2022.
Output:category, product, total_spend.
Constraint: Must handle multiple categories and potentially many products per category.
Edge Case: Categories with fewer than 2 products (should still appear in results).
Edge Case: Products with $0 spend or NULL spend (should be handled by SUM or filtered if necessary).

Approach

Technologies: Snowflake SQL.
Functions:SUM() (Aggregate), RANK() (Window Function), YEAR() (Date Function).
Filtering:QUALIFY clause for post-window function filtering.
Execution Strategy: The optimizer will first scan the table with a filter on transaction_date. It will then perform a Hash Aggregate to calculate sums. Finally, it will compute the window function and filter out rows where the rank > 2.

Implementation

Wrap Up

Advanced Topics

Indexing & Clustering: In Snowflake, there are no traditional indexes. If the product_spend table is massive (terabytes), we should ensure the table is clustered by transaction_date or category to improve partition pruning during the WHERE and GROUP BY phases.
Window Function Performance: The RANK() function requires a sort within each partition. If there's high data skew (one category has millions of products), this can lead to memory spills. In such cases, checking the Query Profile for "spills to local/remote storage" is crucial.
Materialized Views: If this report is run frequently, a Materialized View could be created on the aggregated data (category, product, year) to avoid scanning raw transaction logs repeatedly.
Alternative Ranking: If the business requirement is to always return exactly two rows even if there's a tie, ROW_NUMBER() would be used instead of RANK().