The Question

Substring with Concatenation of All Words

Given a string s and an array of strings words where all strings in words are of the same length, find all starting indices of substring(s) in s that represent a concatenation of every string in words exactly once, in any order, without any intervening characters. Example: Input: s = "barfoothefoobarman", words = ["foo","bar"] Output: [0,9] Explanation: Substrings starting at index 0 and 9 are "barfoo" and "foobar" respectively, which are permutations of ["foo","bar"]. Constraints: - 1 <= s.length <= 10,000 - 1 <= words.length <= 5,000 - 1 <= words[i].length <= 30 - s and words[i] consist of lowercase English letters.

Java

Sliding Window

HashMap

Questions & Insights

Clarifying Questions

What are the constraints on the lengths of `s` and the strings in `words`?

Assumption: s.length() can be up to

10^4

, and the total length of all words combined can also be significant. All strings in words have the identical length

L

Can the `words` array contain duplicate strings?

Assumption: Yes, the solution must handle duplicates using a frequency map rather than a simple set.

What is the character set?

Assumption: Standard lowercase English letters, though the logic applies to any UTF-8 characters.

Is the order of indices in the output array important?

Assumption: No, the result can be returned in any order.

Thinking Process

Sliding Window Optimization: Since every word in words has the same length

L

, we can view the string

s

as a sequence of blocks of size

L

. To cover all possible starting positions, we only need to run a sliding window

L

times, starting from offsets

0, 1, \dots, L-1

Frequency Matching: For each offset, maintain a "window" that spans a total length of

K \times L

(where

K

is the number of words). Use a hash map to track the frequency of words currently in the window and compare it against the target frequency map of words.

Two-Pointer Movement: Within each offset pass, use a right pointer to consume a word of length

L

. If the word exists in our target, increment its count in the window map. If the count exceeds the target or the word is invalid, shrink the window from the left until it becomes valid again.

Result Collection: When the count of words in the current window equals

K

, the start index of the window is a valid concatenated substring starting index.

Implementation Breakdown

Problem Set

Functional Requirements: Find all start indices of substrings in s that are permutations of all strings in words.

Constraints:

1 \le s.length \le 10^4

1 \le words.length \le 5000

1 \le words[i].length \le 30

All

words[i]

have the same length.

Approach

Algorithm: Sliding Window with Multiple Offsets.

Data Structure: HashMap<String, Integer> for frequency tracking.

Complexity:

Time:

O(N \cdot L)

, where

N

is s.length() and

L

is the length of a single word. We perform

L

passes, and each character is visited a constant number of times within each pass. String slicing/hashing takes

O(L)

Space:

O(M \cdot L)

, where

M

is the number of unique words in the words array, to store the frequency maps.

Implementation

Wrap Up

Advanced Topics

Rabin-Karp / Rolling Hash: Instead of using substring and HashMap with strings, one could compute a rolling hash for blocks of size

L

. This could reduce the

O(L)

cost of string operations to

O(1)

on average, making the complexity closer to

O(N)

Space-Time Trade-offs: Using a Trie to store words could optimize prefix checking, but given the "same length" constraint, a HashMap is usually more efficient and simpler to implement.

Parallelization: Each of the

L

offset passes is independent. In a high-throughput system processing massive strings, these passes could be executed in parallel.