DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
SQL

Consecutive Product Usage Streak

Given a table filed_taxes with columns filing_id, user_id, filing_date, and product, find all users who have filed their taxes using any 'TurboTax' product for at least three consecutive years. Note that 'TurboTax' products may have different version names (e.g., 'TurboTax Basic', 'TurboTax Deluxe'). Assume each user files at most once per calendar year. Return the results sorted by user_id.
PostgreSQL
CTE
Window Function
EXTRACT
ILIKE
Questions & Insights

Clarifying Questions

How is "TurboTax" identified? Does the product string start with "TurboTax", or can it appear anywhere? (Assumption: We will use a case-insensitive prefix match LIKE 'TurboTax%' to capture "TurboTax Free", "TurboTax Deluxe", etc.).
What defines the "Year" of filing? Is it the calendar year of the filing_date or a specific tax year column? (Assumption: We will use the calendar year extracted from filing_date. We also assume a user only files once per calendar year per the prompt's constraint).
How are consecutive years handled? If a user files in 2020, 2021, 2022, and 2023, do they count? (Yes, "3 or more").
Schema Assumptions:
filing_id: Primary Key (Integer).
user_id: Foreign Key/Identifier (Varchar).
filing_date: Timestamp/Datetime.
product: Varchar (e.g., 'TurboTax Deluxe', 'H&R Block').
Relationship: 1:N (One user can have many filing records across different years).

Thinking Process

Filter and Pre-process: First, we must narrow the dataset to only include TurboTax-related filings. We should also extract the YEAR from the filing_date.
Deduplication (Defensive): Although the prompt states users file once a year, in a real-world scenario, data might be messy. I will treat the data as unique per (user_id, year) for the "consecutive" logic.
Sequencing/Windowing: To find consecutive years, we have two primary methods:
Method A (LAG/LEAD): Look at the previous and the one-before-previous year for each record. If CurrentYear - 1 == PrevYear and PrevYear - 1 == PrevPrevYear, we have a 3-year streak.
Method B (Difference Grouping): Subtract a row number from the year. Constant differences indicate consecutive sequences.
Selection: For exactly "3 or more", LAG is highly readable and performant for a fixed window of 3.
Final Aggregation: Since a user might have multiple 3-year streaks (e.g., 2010-2012 and 2018-2020), we use DISTINCT or GROUP BY to ensure each user_id is listed only once.
Implementation Breakdown

Problem Set

Goal: Identify user_ids with \ge 3 consecutive years of TurboTax usage.
Constraints: Use PostgreSQL syntax. Output must be ordered by user_id.
Edge Cases:
Users with exactly 3 years.
Users with > 3 years.
Users with gaps (e.g., 2019, 2020, 2022) — these should be excluded.
Different TurboTax versions used in consecutive years (e.g., 'TurboTax Basic' in Year 1, 'TurboTax Premier' in Year 2).

Approach

CTEs: To organize the logic into discrete steps (Filtering -> Windowing -> Final Selection).
Window Functions:LAG() to access data from preceding rows within the same user_id partition.
Date Functions:EXTRACT(YEAR FROM ...) or date_part for PostgreSQL specific year extraction.
Filtering:LIKE operator for string pattern matching.

Implementation

Wrap Up

Advanced Topics

Indexing: To optimize this query, a composite index on (product, filing_date, user_id) would be ideal. Since we filter by product first, having it as the leading column in a B-Tree index allows the engine to ignore non-TurboTax rows immediately.
Gaps and Islands (Scalability): If the requirement changed to "Identify users with a streak of at least N years" (where N is large), the LAG approach becomes cumbersome. The "Difference Grouping" method would be superior:
  SELECT user_id, count(*) as streak_length
  FROM (
      SELECT user_id, filing_year, 
             filing_year - ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY filing_year) as grp
      FROM TurboTaxFilings
  ) t
  GROUP BY user_id, grp
  HAVING count(*) >= 3
Execution Plan: In PostgreSQL, the DISTINCT in the first CTE might trigger a HashAggregate. The window functions will trigger a WindowAgg node, which usually requires the data to be sorted by user_id and filing_year. If the table is massive, this Sort could be a bottleneck (spilling to disk).