Hardness results for the center and median string problems under the weighted and unweighted edit distances

Abstract

Given a finite set of strings, the Median String problem consists in finding a string that minimizes the sum of the edit distances to the strings in the set. Approximations of the median string are used in a very broad range of applications where one needs a representative string that summarizes common information to the strings of the set. It is the case in classification, in speech and pattern recognition, and in computational biology. In the latter, Median String is related to the key problem of multiple alignment. In the recent literature, one finds a theorem stating the NP-completeness of the Median String for unbounded alphabets. However, in the above mentioned areas, the alphabet is often finite. Thus, it remains a crucial question whether the Median String problem is NP-complete for bounded and even binary alphabets. In this work, we provide an answer to this question and also give the complexity of the related Center String problem. Moreover, we study the parameterized complexity of both problems with respect to the number of input strings. In addition, we provide an algorithm to compute an optimal center under a weighted edit distance in polynomial time when the number of input strings is fixed.

Publication
Journal of Discrete Algorithms
concensus string multiple alignment tree alignment NP-complete parameterized complexity LCS