Substring with Concatenation of All Words

Spread the love

Close-up of HTML code highlighted in vibrant colors on a computer monitor.

Table of Contents

Problem Statement: Substring with Concatenation of All Words

Given a string s and a list of words words, you need to find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

Each word in the list words has the same length. The length of the concatenated substring is the sum of the lengths of all the words, and the substring must consist of exactly one occurrence of each word from the list (in any order).

Input:

A string s of length n (1 ≤ n ≤ 10^4).
A list words containing m words. Each word is a string of length w (1 ≤ w ≤ 100).

Output:

A list of all starting indices of substrings in s that are a concatenation of all the words in words exactly once, and in any order. If no such substring is found, return an empty list.

Example:

Example 1:

Input: s = "barfoothefoobarman", words = ["foo", "bar"]
Output: [0, 9]
Explanation: The substring starting at index 0 is "barfoo", and the substring starting at index 9 is "foobar".

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word", "good", "best", "good"]
Output: [8]
Explanation: The substring starting at index 8 is "goodbestword".

Approach:

Word Length and Total Length:
- Each word in words has the same length w.
- The total length of the concatenated substring must be m * w, where m is the number of words in words.
Sliding Window Approach:
- Use a sliding window technique to traverse through the string s.
- For each possible starting index i (from 0 to n - m * w), extract a substring of length m * w and check if it is a concatenation of all words in words.
Word Frequency Map:
- We will maintain a frequency map of the words in words. As we slide through the string s, we’ll extract substrings of length w and check if they exist in this map and maintain a frequency count to ensure all words appear exactly once.
Efficiency Considerations:
- The sliding window size is m * w, and for each window, we perform a check of length m (number of words), which results in an overall time complexity of O(n * m), where n is the length of s and m is the number of words.

Algorithm:

Create a frequency map (word_map) of all words in words.
Iterate over all possible starting indices of the substring in s (from 0 to n - m * w).
For each starting index:
- Use a sliding window of size m * w to check if the substring matches a concatenation of words in words.
- Maintain a word_count map for the words in the current window. If the map matches the word_map, add the starting index to the result list.
Return the result list.

Code Implementation:

Python:

from collections import Counter

def findSubstring(s, words):
    if not s or not words:
        return []
    
    word_len = len(words[0])
    word_count = len(words)
    word_map = Counter(words)  # Frequency map of the words in 'words'
    result = []
    
    # Sliding window of size word_len * word_count
    for i in range(word_len):
        left = i
        right = i
        current_count = 0
        window_map = Counter()  # Map to track the current window
        
        while right + word_len <= len(s):
            word = s[right:right + word_len]
            right += word_len
            
            if word in word_map:
                window_map[word] += 1
                current_count += 1
                
                # If the word count exceeds the expected count, move left pointer
                while window_map[word] > word_map[word]:
                    left_word = s[left:left + word_len]
                    window_map[left_word] -= 1
                    current_count -= 1
                    left += word_len
                
                # If we have found all words in the window, add to the result
                if current_count == word_count:
                    result.append(left)
            else:
                # Reset the window if the word is not in the words list
                window_map.clear()
                current_count = 0
                left = right
    
    return result

Explanation:

Step 1: We start by creating a frequency map (word_map) of the words in the list words using Counter.
Step 2: The sliding window is used to move through the string. We try every possible start index (i from 0 to word_len - 1), and slide a window of size m * w.
Step 3: For each word in the window, we check if it exists in the word_map and maintain a counter for the words in the window (window_map). If the count of any word exceeds the expected frequency, we move the left pointer to adjust the window.
Step 4: If the window contains exactly all the words in words (in any order), we add the starting index of the window to the result list.
Step 5: Return the result list containing all valid starting indices.

Time and Space Complexity:

Time Complexity: O(n * m), where n is the length of string s and m is the number of words. The outer loop runs m times (for each starting index) and each sliding window operation takes O(m) time to verify word frequencies.
Space Complexity: O(m) for storing the frequency map (word_map) and the sliding window map (window_map).

Example Walkthrough:

Example 1:

Input: s = "barfoothefoobarman", words = ["foo", "bar"]

The word length is 3, and the total length of the concatenated substring is 3 * 2 = 6.
We slide through the string s and check all possible substrings of length 6:
- “barfoo” at index 0 is a valid concatenation of “bar” and “foo”.
- “foobar” at index 9 is also valid.
Thus, the output is [0, 9].

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word", "good", "best", "good"]

The word length is 4, and the total length of the concatenated substring is 4 * 4 = 16.
We check all possible substrings of length 16 in s:
- “goodbestword” at index 8 is the only valid concatenation of “word”, “good”, “best”, and “good”.
Thus, the output is [8].