Longest common subsequence problem for unoriented and cyclic strings

February 2007

PDF DOI

Abstract

Given a finite set of strings X, the Longest Common Subsequence problem (LCS) consists in finding a subsequence common to all strings in X that is of maximal length. LCS is a central problem in stringology and finds broad applications in text compression, conception of error-detecting codes, or biological sequence comparison. However, in numerous contexts, words represent cyclic or unoriented sequences of symbols and LCS must be generalized to consider both orientations and/or all cyclic shifts of the strings involved. This occurs especially in computational biology when genetic material is sequenced from circular DNA or RNA molecules. In this work, we define three variants of LCS when the input words are unoriented and/or cyclic. We show that these problems are NP-hard, and W[1]-hard if parameterized in the number of input strings. These results still hold even if the three LCS variants are restricted to input languages over a binary alphabet. We also settle the parameterized complexity of our problems for most relevant parameters. Moreover, we study the approximability of these problems: we discuss the existence of approximation bounds depending on the cardinality of the alphabet, on the length of the shortest sequence, and on the number of input sequences. For this we prove that Maximum Independent Set in r-uniform hypergraphs is W[1]-hard if parameterized in the cardinality of the sought independent set and at least as hard to approximate as Maximum Independent Set in graphs.

Type

Journal article

Publication

Theoretical Computer Science