DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
SQL

Title Case Transformation with Hyphen Sensitivity

You are given a table user_content with a column content_text. Your task is to generate a report that transforms the text into Title Case based on specific business rules. 1. The first letter of every word must be capitalized, and all subsequent letters in that word must be lowercase. 2. The transformation must recognize a hyphen (-) as a word boundary, meaning segments on both sides of a hyphen (e.g., 'high-speed') must be capitalized ('High-Speed'). 3. All original spacing, including multiple consecutive spaces, must be preserved exactly as they appear in the source data. Return the original content_id, content_text, and the newly transformed_text column.
PostgreSQL
INITCAP
PostgreSQL
Questions & Insights

Clarifying Questions

What is the precise definition of a "word"? In PostgreSQL, INITCAP defines a word as a sequence of alphanumeric characters separated by non-alphanumeric characters (including spaces, hyphens, and punctuation). If the requirement strictly limits boundaries to spaces and hyphens, we might need a more complex Regex solution to avoid over-capitalizing after characters like apostrophes (e.g., don't vs ).
How should digits be handled? If a "word" starts with a number (e.g., 1st-place), INITCAP treats the digit as part of the alphanumeric sequence.
What is the maximum length of `content_text`? This affects whether we can use memory-intensive operations like regexp_split_to_table or if we should stick to optimized built-in string functions.
Are there non-ASCII/UTF-8 characters? PostgreSQL's INITCAP is locale-aware and generally handles multi-byte characters correctly based on the database collation.
Assumptions:
Primary Key:content_id is the unique identifier.
Schema: The table is a standard Fact/Dimension table (user_content).
Rule Alignment: Rule 2 ("both parts of a hyphenated word should be capitalized") perfectly matches the default behavior of PostgreSQL's INITCAP function, which treats the hyphen - as a non-alphanumeric word boundary.
Null Handling: If content_text is NULL, the function should return NULL.

Thinking Process

Analyze Rule 1: "First letter uppercase, remaining lowercase." This is the textbook definition of the Title Case transformation. In SQL, this is typically handled by INITCAP().
Analyze Rule 2: "Words connected with a hyphen should have both parts capitalized." This is a crucial detail.
Standard INITCAP behavior: It treats any non-alphanumeric character as a boundary.
Example: top-rated.
Since the hyphen is non-alphanumeric, INITCAP naturally splits "top" and "rated" into two words and capitalizes both.
Analyze Rule 3: "Formatting and spacing remain unchanged."
Built-in string functions like INITCAP do not collapse whitespace (unlike some trim/split methods), making it ideal for preserving the original layout (e.g., double spaces).
Verification of `INITCAP` in PostgreSQL:
INITCAP('SQL-puzzles are TOP-RATED')Sql-Puzzles Are Top-Rated.
The "SQL" becomes "Sql" (remaining letters lowercase), "TOP-RATED" becomes "Top-Rated" (hyphen handled).
Efficiency: For a large-scale data engineering task, using the built-in INITCAP function is O(N) and significantly faster than splitting the string into an array/table, transforming, and aggregating it back together.
Implementation Breakdown

Problem Set

Goal: Transform content_text into a specific Title Case.
Rules:
First letter capitalized, rest lowercase.
Hyphenated segments are treated as separate words for capitalization.
Preserve original whitespace and structure.
Edge Cases:
Words already in ALL CAPS (must be lowercased after the first letter).
Multiple hyphens (e.g., state-of-the-art).
Leading/trailing spaces or multiple spaces between words.
Empty strings or NULL values.

Approach

Core Function:INITCAP(string).
Context: Use a SELECT statement to project the original column alongside the transformed column.
Execution Strategy: Since this is a scalar transformation, the database will perform a sequential scan (or index scan if applicable) applying the transformation row-by-row. Computational cost is minimal.

Implementation

Wrap Up

Advanced Topics

The "Apostrophe" Problem: One nuance of PostgreSQL's INITCAP is that it treats ' as a word boundary. it's becomes It'S. If the business requirements specifically demand that apostrophes not be boundaries, you would use a more complex REGEXP_REPLACE approach or a custom function:
    -- Example of a more surgical approach using Regex (if INITCAP is too broad)
    SELECT 
        string_agg(
            upper(left(w, 1)) || lower(substring(w from 2)), 
            ' ' ORDER BY ord
        )
    FROM regexp_split_to_table(content_text, ' ') WITH ORDINALITY AS t(w, ord);
Performance:INITCAP is a highly optimized C-based function within the PostgreSQL kernel. In a high-volume pipeline (e.g., processing millions of rows in Spark or a distributed DB), scalar functions like these are "embarrassingly parallel" and scale linearly with the number of CPU cores.
Collation: Be aware that INITCAP behavior can change based on the database LC_CTYPE setting, which determines what the engine considers a "letter" in different languages (e.g., Turkish 'i' vs 'I').