
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Longest Repetitive Sequence in a String in Python
Strings are the essential data types used in many real-world problems that involve analysing and manipulating text data.
In this article, we are going to learn about finding the longest repetitive sequence in a string.
The Repetitive sequence refers to a substring that appears more than once in the given string. For performing this task, Python provides built-in features.
Using Suffix Array and LCP
A suffix array is used to store all the suffixes of the given string in lexicographic order.
In this approach, we will consider the input string and create a list of all the suffixes of the string, then we will sort it in lexicographic order.
After that, we compare each adjacent pair of the suffixes to find the longest common prefix. The longest common prefix is the longest repeated substring.
Example
In the following example, we are going to find the longest repeated substring in the "WELCOME" using the suffix array.
def demo(s): n = len(s) a = [s[i:] for i in range(n)] a.sort() lrs = "" for i in range(n - 1): b = demo1(a[i], a[i + 1]) if len(b) > len(lrs): lrs = b return lrs def demo1(x1, x2): result = "" for x, y in zip(x1, x2): if x == y: result += x else: break return result print(demo("WELCOME"))
The output of the above program is as follows -
E
Using Sliding Window and Set
The second approach is by using the sliding window algorithm and a set. Here we will generate all the possible substrings using the loops, and we will use a set to keep track of the substrings that have been seen. If the substring appears again and is longer than the current result, we will update the result.
Example
Following is the example where we are going to consider the input as " 112212213" and find the longest repeated substring.
def demo(s): x = set() n = len(s) max_len = 0 result = "" for i in range(n): for j in range(i + 1, n + 1): y = s[i:j] if y in x and len(y) > max_len: result = y max_len = len(y) x.add(y) return result print(demo("112212213"))
The following is the output of the above program -
1221
Using Python Dictionary
The third approach is by using the Python dictionary. Here we will generate all the substrings and store their occurrence count in the dictionary, and update the result if the substring is found more than once and is longer than the current max.
Example
Consider the following example, where we are going to find the longest repetitive sequence in the string "tutorialspoint".
from collections import defaultdict def demo(s): x = defaultdict(int) n = len(s) max_len = 0 result = "" for i in range(n): for j in range(i + 1, n + 1): y = s[i:j] x[y] += 1 if x[y] > 1 and len(y) > max_len: max_len = len(y) result = y return result print(demo("tutorialspoint"))
The following is the output of the above program -
t