DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
SQL

CRM Marketing Touch Streak Analysis

A marketing team wants to reward highly engaged potential customers. Given a marketing_touches table (event_id, contact_id, event_type, event_date) and a crm_contacts table (contact_id, email), find the email addresses of all contacts who meet two conditions: 1. They had at least one marketing touch per week for at least three consecutive weeks (weeks are defined as starting on Monday). 2. They have performed at least one touch of type 'trial_request' at any point in their history. Ensure the solution handles contacts with multiple touches in a single week correctly and accounts for streaks that might cross year boundaries.
Snowflake
CTE
Window Function
DENSE_RANK
Gaps and Islands
DATE_TRUNC
DATEADD
Questions & Insights

Clarifying Questions

What defines a "week"? I will assume the standard calendar week (starting Monday) using Snowflake's DATE_TRUNC('WEEK', ...) logic. If a contact has multiple touches in the same week, they are collapsed into a single active week for the "consecutive" calculation.
What is the relationship between `contact_id` and `email`? I assume a contact_id is a unique identifier for a person in the CRM, and while one email should map to one contact_id, I will return DISTINCT emails to handle potential data quality issues where multiple IDs might share an email.
How are "consecutive weeks" calculated across year boundaries? I will use date-based arithmetic rather than just week numbers to ensure that Week 52 of 2023 and Week 1 of 2024 are correctly identified as consecutive.
Data Model Assumptions:
marketing_touches: Fact table (Event stream). event_id is the PK. contact_id is a FK to the CRM table.
crm_contacts: Dimension table. contact_id is the PK.
event_type: Categorical string. We are specifically looking for the 'trial_request' value.

Thinking Process

Gaps and Islands Problem: The core of the "consecutive weeks" requirement is a classic "Gaps and Islands" SQL puzzle.
First, I need to find distinct weeks of activity per contact.
Second, I need to assign a "Group ID" to consecutive weeks. A common trick is: week_date - (ROW_NUMBER() * 7 days). If the weeks are consecutive, this subtraction will result in the same constant date for the entire "island".
Trial Request Filter: I need to identify the subset of users who have at least one 'trial_request'. This can be done via a CTE or a WHERE clause in a subquery to reduce the dataset early.
Aggregation & Filtering: Group by the contact_id and the "Island ID" and count the records. Filter for COUNT(*) >= 3.
Join for Final Output: Join the resulting list of valid contact_ids with the crm_contacts table to fetch the email addresses.
Snowflake Optimization: Use QUALIFY to filter window function results without an extra subquery layer if possible, and DATE_TRUNC for week alignment.
Implementation Breakdown

Problem Set

Requirement 1: Identify contacts with 3 or more consecutive weeks of any marketing touch.
Requirement 2: Identify contacts with at least one 'trial_request' event.
Output: Distinct email addresses of contacts meeting both criteria.
Edge Cases:
Users with multiple touches in one week (must be treated as 1 week).
Streaks spanning across years (2023-12-25 and 2024-01-01 are consecutive).
Users meeting the trial request criteria but not the consecutive week criteria (and vice versa).

Approach

CTE (Common Table Expressions): To organize the logic into discrete steps (unique weeks, island grouping, trial filtering).
Window Functions:DENSE_RANK() or ROW_NUMBER() to identify sequences.
Date Functions:DATE_TRUNC('WEEK', ...) and DATEADD(week, ...) for week-level arithmetic.
Join Strategy: Inner Join between the activity-filtered set and the CRM dimension table.
Computational Cost: The Gaps and Islands approach is O(N \log N) due to the sort required for window functions, but highly efficient in Snowflake’s columnar execution engine.

Implementation

Wrap Up

Advanced Topics

Indexing & Performance: In Snowflake, there are no traditional indexes. To optimize this, ensure event_date is part of the table's Clustering Key if the marketing_touches table is massive. This prunes micro-partitions during the DATE_TRUNC and WHERE operations.
QUALIFY Clause: In Snowflake, we could simplify the consecutive_contacts logic using the QUALIFY clause to filter the results of window functions directly, though the Gaps and Islands grouping usually requires a explicit GROUP BY.
Handling Scale: If the number of contacts is in the hundreds of millions, the INNER JOIN between consecutive_contacts and trial_requesters is effectively an intersection. Using a WHERE contact_id IN (SELECT ...) or INTERSECT might be interpreted similarly by the optimizer, but the Join allows for better metadata sharing in the execution plan.
Data Skew: If one contact_id has millions of touches (e.g., a bot), the window function DENSE_RANK might cause a "spill to disk" on a single worker. We could mitigate this by filtering event_type = 'trial_request' as early as possible or using a smaller warehouse with high memory.