Problem Statement: Substring with Concatenation of All Words
Given a string s
and a list of words words
, you need to find all starting indices of substring(s) in s
that is a concatenation of each word in words
exactly once and without any intervening characters.
Each word in the list words
has the same length. The length of the concatenated substring is the sum of the lengths of all the words, and the substring must consist of exactly one occurrence of each word from the list (in any order).
Input:
- A string
s
of lengthn
(1 ≤ n ≤ 10^4). - A list
words
containingm
words. Each word is a string of lengthw
(1 ≤ w ≤ 100).
Output:
- A list of all starting indices of substrings in
s
that are a concatenation of all the words inwords
exactly once, and in any order. If no such substring is found, return an empty list.
Example:
Example 1:
Input: s = "barfoothefoobarman", words = ["foo", "bar"]
Output: [0, 9]
Explanation: The substring starting at index 0 is "barfoo", and the substring starting at index 9 is "foobar".
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word", "good", "best", "good"]
Output: [8]
Explanation: The substring starting at index 8 is "goodbestword".
Approach:
- Word Length and Total Length:
- Each word in
words
has the same lengthw
. - The total length of the concatenated substring must be
m * w
, wherem
is the number of words inwords
.
- Each word in
- Sliding Window Approach:
- Use a sliding window technique to traverse through the string
s
. - For each possible starting index
i
(from0
ton - m * w
), extract a substring of lengthm * w
and check if it is a concatenation of all words inwords
.
- Use a sliding window technique to traverse through the string
- Word Frequency Map:
- We will maintain a frequency map of the words in
words
. As we slide through the strings
, we’ll extract substrings of lengthw
and check if they exist in this map and maintain a frequency count to ensure all words appear exactly once.
- We will maintain a frequency map of the words in
- Efficiency Considerations:
- The sliding window size is
m * w
, and for each window, we perform a check of lengthm
(number of words), which results in an overall time complexity of O(n * m), wheren
is the length ofs
andm
is the number of words.
- The sliding window size is
Algorithm:
- Create a frequency map (
word_map
) of all words inwords
. - Iterate over all possible starting indices of the substring in
s
(from0
ton - m * w
). - For each starting index:
- Use a sliding window of size
m * w
to check if the substring matches a concatenation of words inwords
. - Maintain a
word_count
map for the words in the current window. If the map matches theword_map
, add the starting index to the result list.
- Use a sliding window of size
- Return the result list.
Code Implementation:
Python:
from collections import Counter
def findSubstring(s, words):
if not s or not words:
return []
word_len = len(words[0])
word_count = len(words)
word_map = Counter(words) # Frequency map of the words in 'words'
result = []
# Sliding window of size word_len * word_count
for i in range(word_len):
left = i
right = i
current_count = 0
window_map = Counter() # Map to track the current window
while right + word_len <= len(s):
word = s[right:right + word_len]
right += word_len
if word in word_map:
window_map[word] += 1
current_count += 1
# If the word count exceeds the expected count, move left pointer
while window_map[word] > word_map[word]:
left_word = s[left:left + word_len]
window_map[left_word] -= 1
current_count -= 1
left += word_len
# If we have found all words in the window, add to the result
if current_count == word_count:
result.append(left)
else:
# Reset the window if the word is not in the words list
window_map.clear()
current_count = 0
left = right
return result
Explanation:
- Step 1: We start by creating a frequency map (
word_map
) of the words in the listwords
usingCounter
. - Step 2: The sliding window is used to move through the string. We try every possible start index (
i
from0
toword_len - 1
), and slide a window of sizem * w
. - Step 3: For each word in the window, we check if it exists in the
word_map
and maintain a counter for the words in the window (window_map
). If the count of any word exceeds the expected frequency, we move the left pointer to adjust the window. - Step 4: If the window contains exactly all the words in
words
(in any order), we add the starting index of the window to the result list. - Step 5: Return the result list containing all valid starting indices.
Time and Space Complexity:
- Time Complexity: O(n * m), where
n
is the length of strings
andm
is the number of words. The outer loop runsm
times (for each starting index) and each sliding window operation takesO(m)
time to verify word frequencies. - Space Complexity: O(m) for storing the frequency map (
word_map
) and the sliding window map (window_map
).
Example Walkthrough:
Example 1:
Input: s = "barfoothefoobarman", words = ["foo", "bar"]
- The word length is 3, and the total length of the concatenated substring is 3 * 2 = 6.
- We slide through the string
s
and check all possible substrings of length 6:- “barfoo” at index 0 is a valid concatenation of “bar” and “foo”.
- “foobar” at index 9 is also valid.
- Thus, the output is
[0, 9]
.
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word", "good", "best", "good"]
- The word length is 4, and the total length of the concatenated substring is 4 * 4 = 16.
- We check all possible substrings of length 16 in
s
:- “goodbestword” at index 8 is the only valid concatenation of “word”, “good”, “best”, and “good”.
- Thus, the output is
[8]
.