Find Longest Repetitive Sequence in a String in Python



Strings are the essential data types used in many real-world problems that involve analysing and manipulating text data.

In this article, we are going to learn about finding the longest repetitive sequence in a string.

The Repetitive sequence refers to a substring that appears more than once in the given string. For performing this task, Python provides built-in features. 

Using Suffix Array and LCP

A suffix array is used to store all the suffixes of the given string in lexicographic order.

In this approach, we will consider the input string and create a list of all the suffixes of the string, then we will sort it in lexicographic order.

After that, we compare each adjacent pair of the suffixes to find the longest common prefix. The longest common prefix is the longest repeated substring.

Example

In the following example, we are going to find the longest repeated substring in the "WELCOME" using the suffix array.

def demo(s):
    n = len(s)
    a = [s[i:] for i in range(n)]
    a.sort()
    lrs = ""
    for i in range(n - 1):
        b = demo1(a[i], a[i + 1])
        if len(b) > len(lrs):
            lrs = b
    return lrs
def demo1(x1, x2):
    result = ""
    for x, y in zip(x1, x2):
        if x == y:
            result += x
        else:
            break
    return result
print(demo("WELCOME"))

The output of the above program is as follows -

E

Using Sliding Window and Set

The second approach is by using the sliding window algorithm and a set. Here we will generate all the possible substrings using the loops, and we will use a set to keep track of the substrings that have been seen. If the substring appears again and is longer than the current result, we will update the result.

Example

Following is the example where we are going to consider the input as " 112212213" and find the longest repeated substring.

def demo(s):
    x = set()
    n = len(s)
    max_len = 0
    result = ""
    for i in range(n):
        for j in range(i + 1, n + 1):
            y = s[i:j]
            if y in x and len(y) > max_len:
                result = y
                max_len = len(y)
            x.add(y)
    return result
print(demo("112212213"))

The following is the output of the above program -

1221

Using Python Dictionary

The third approach is by using the Python dictionary. Here we will generate all the substrings and store their occurrence count in the dictionary, and update the result if the substring is found more than once and is longer than the current max.

Example

Consider the following example, where we are going to find the longest repetitive sequence in the string "tutorialspoint".

from collections import defaultdict
def demo(s):
    x = defaultdict(int)
    n = len(s)
    max_len = 0
    result = ""
    for i in range(n):
        for j in range(i + 1, n + 1):
            y = s[i:j]
            x[y] += 1
            if x[y] > 1 and len(y) > max_len:
                max_len = len(y)
                result = y
    return result
print(demo("tutorialspoint"))

The following is the output of the above program -

t
Updated on: 2025-06-10T19:20:08+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started