Writing linear time algorithm to find longest repeat prefix


1. Using suffix trees, give an algorithm to find a longest common substring shared among three input strings: s1 of length n1, s2 of length n2 and s3 of length n3.

2. A non-empty string α is called a minimal unique substring of s if and only if it satisfies:

(i) α occurs exactly once in s (uniqueness),

(ii) all proper prefixes of α occur at least twice in s (mimimality), and

(iii) α >= l for some constant l.Give an optimal algorithm to enumerate all minimal unique substrings of s.

3. Redundant sequence identification: Given a set of k DNA sequences, S = {s1, s2........sk }, give an optimal algorithm to identify all sequences that are completely contained in (i.e., substrings of) at least one other sequence in S.

4. Let S = {s1, s2,......,sk} denote a set of k genomes. The problem of  fingerprinting is the task of identifying a shortest possible substring αi from each string si such  that αi is unique to si  i.e., no other genome in the set S has αi. Such an αi will be called a fingerprint of si. (Note that it is OK for αi to be present more than once within si.) Give an algorithm to enumerate a fingerprint for each input genome, if one exists. Assume that no two input genomes are identical.

5. A string s is said to be periodic with a period α , if s is αk for some k>=2. (Note that αk is the string formed by concatenating α k times.) A DNA sequence s is called a tandem repeat if it is periodic. Given a DNA sequence s, determine if it is periodic, and if so, the values for α and k. Note that there could be more than one period for a periodic string. In such a case, you need to report the shortest period.

6. A non-empty string β is called a repeat prefix of a string s if ββ is a prefix of s. Give a linear time algorithm to find the longest repeat prefix of s.

7. Given strings s1 and s2 of lengths m and n respectively, a minimum cover of s1 by s2 is a decomposition s1 = w1w2......wk, where each wi is a non-empty substring of s2 and k is minimized. Eg., given s1 = accgtatct and s2 = cgtactcatc, there are several covers of s1 by s2 possible, two of which are: (i) cover1: s1 = w1w2w3w4 (where w1 = ac, w2 = cgt, w3 = atc, w4 = t) , and (ii) cover2: s1 = w1w2w3w4w5 (where w1 = ac, w2 = c, w3 = gt, w4 = atc, w5 = t). However, only cover1 is a minimal cover. Give an algorithm to compute a minimum cover (if one exists) in O(m + n) time and space. If a minimum cover does not exist your algorithm should state so and terminate within the same time and space bounds. Give a brief justification of why you think your algorithm is correct | meaning, how it guarantees finding the minimial cover (if one exists).

Request for Solution File

Ask an Expert for Answer!!
Theory of Computation: Writing linear time algorithm to find longest repeat prefix
Reference No:- TGS01531

Expected delivery within 24 Hours