The Question

Shortest Common Supersequences with Unique Character Frequencies

Given an array of strings words, find all possible character frequency distributions of its shortest common supersequences (SCS). A shortest common supersequence is a string of minimum length that contains each string in words as a subsequence. Two supersequences are considered identical for this problem if one is a permutation of the other (i.e., they have the same character counts). Return a 2D array freqs where each freqs[i] is an integer array of size 26 representing the frequency of each lowercase English letter ('a'-'z') for a unique SCS multiset. You may return the frequency arrays in any order. Constraints: - 1 <= words.length <= 8 - 1 <= words[i].length <= 12 - words[i] consists of lowercase English letters.

Java

Memoization

DFS

HashMap

HashSet

Questions & Insights

Clarifying Questions

What are the constraints on the number of strings ($k$) and the length of each string ($L$)?

Assumption: Since the Multiple Shortest Common Supersequence (SCS) problem is NP-hard, we assume

k

and

L

are small enough (e.g.,

k \le 8, L \le 12

) such that the state space of indices is manageable within memory and time limits.

Is the input alphabet limited to lowercase English letters?

Assumption: Yes, the problem specifies a return array of size 26, implying 'a' through 'z'.

How should empty strings or duplicate strings in the input be handled?

Assumption: Empty strings are naturally skipped as they provide no character candidates. Duplicate strings are handled by the state pointers but do not increase the complexity.

What defines "not permutations of each other"?

Assumption: Two strings are permutations of each other if they have the identical character frequency counts. Thus, we need to return the set of unique frequency vectors that characterize all possible SCSs.

Thinking Process

State Definition: Model the problem as a shortest path search in a Directed Acyclic Graph (DAG). Each state is a tuple

(i_1, i_2, \dots, i_k)

where

i_j

is the current index in the

j

-th word.

Transitions: At any state, the next character

c

added to the supersequence must be at the current pointer of at least one word. Adding

c

advances pointers in all words where

words[j][i_j] == c

Shortest Path Criteria: Use Dynamic Programming with Memoization to find the minimum length of the supersequence starting from any state. Only propagate character frequency multisets from transitions that maintain this global minimum length.

Deduplication: Use a Set to store frequency distributions (multisets) at each state to ensure that "permutations" (strings with identical counts) are only recorded once.

Implementation Breakdown

Problem Set

Functional Requirements:

Compute the minimum length of a common supersequence for all strings.

Identify all distinct character frequency counts that achieve this minimum length.

Return the results as a 2D integer array.

Constraints:

Alphabet: Lowercase English ('a'-'z').

Time/Space: Must handle

O(L^k \cdot \Sigma)

complexity effectively.

Approach

Algorithm: Dynamic Programming with Memoization (Top-Down DFS).

Data Structure: HashMap for memoization, custom Wrapper classes for int[] to enable correct hashing/equality in collections.

Complexity:

Time:

O(\Sigma \cdot L^k \cdot M)

, where

\Sigma = 26

L

is word length,

k

is number of words, and

M

is the number of distinct multisets per state.

Space:

O(L^k \cdot M)

to store memoized results.

Implementation

Wrap Up

Advanced Topics

Performance vs. Space: For larger constraints, the memory used by storing Set<FreqArray> in every memo state can be excessive. A 2-pass approach is more efficient: first pass computes minLen for all states, and a second pass (reconstructing from the end) calculates the multisets only for the optimal paths.

NP-Hardness: Mentioning that finding the SCS for

n

strings is NP-hard shows deep theoretical knowledge.

Heuristics (A*): If only one SCS was needed, A* search with a heuristic (like the length of the longest remaining string) would significantly prune the search space.

Iterative DP: An iterative approach could avoid recursion stack depth limits, though the state-dependency in this DAG is more naturally handled via recursion.