×

The parameterized complexity of sequence alignment and consensus. (English) Zbl 0888.68060

Summary: The longest common subsequence problem is examined from the point of view of parameterized computational complexity. There are several different ways in which parameters enter the problem, such as the number of sequences to be analyzed, the length of the common subsequence, and the size of the alphabet. Lower bounds on the complexity of this basic problem imply lower bounds on a number of other sequence alignment and consensus problems. An issue in the theory of parameterized complexity is whether a problem which takes input \((x, k)\) can be solved in time \(f(k)\cdot n^\alpha\), where \(\alpha\) is independent of \(k\) (termed fixed-parameter tractability). It can be argued that this is the appropriate asymptotic model of feasible computability for problems for which a small range of parameter values covers important applications – a situation which certainly holds for many problems in biological sequence analysis. Our main results show that: (1) The longest common subsequence (LCS) parametrized by the number of sequences to be analyzed is hard for \(W[t]\) for all \(t\). (2) Let LCS problem, parameterized by the length of the common subsequence, belongs to \(W[P]\) and is hard for \(W[2]\). (3) The LCS problem parameterized both by the number of sequences and the length of the common subsequence, is complete for \(W[1]\). All of the above results are obtained for unrestricted alphabet sizes. For alphabets of a fixed size, problems (2) and (3) are fixed-parameter tractable. We conjecture that (1) remains hard.

MSC:

68Q15 Complexity classes (hierarchies, relations among complexity classes, etc.)
Full Text: DOI

References:

[1] Baeza-Yates, R. A., Searching subsequences, Theoret. Comput. Sci., 78, 363-376 (1991) · Zbl 0717.68070
[2] Bodlaender, H.; Fellows, M.; Hallett, M., Beyond NP-completeness for problems of bounded width: hardness for the \(W\) hierarchy, (Proc. 26th ACM Symp. on the Theory of Computing (1994)), 449-458 · Zbl 1345.68152
[3] Cai, L.; Chen, J.; Downey, R.; Fellows, M., (The parameterized complexity of short computations and factorization, Technical Report (1993), Department of Computer Science, University of Victoria: Department of Computer Science, University of Victoria Wellington)
[4] Day, W. H.E.; McMorris, F. R., Discovering consensus molecular sequences, (Opitz, O.; Lausen, B.; Khar, R., Information and Classification — Concepts, Methods, and Applications (1993), Springer: Springer Berlin), 393-402 · Zbl 0899.92018
[5] Day, W. H.E.; McMorris, F. R., The computation of consensus patterns in DNA sequences, Math. Comput. Modelling, 17, 49-52 (1993) · Zbl 0800.68727
[6] Downey, R.; Evans, P.; Fellows, M., Parameterized learning complexity, (Proc. 6th ACM Workshop on Computational Learning Theory (COLT) (1993), ACM: ACM New York), 51-57
[7] Downey, R.; Fellows, M., Fixed-parameter intractability (extended abstract), (Proc. 7th Ann. Conf. on Structure in Complexity Theory (1992), IEEE Computer Soc. Press: IEEE Computer Soc. Press Los Alamitos, CA), 36-49
[8] Downey, R.; Fellows, M.; Kapron, B.; Hallett, M.; Wareham, H. T., The parameterized complexity of some problems in logic and linguistics, (Proc. Symp. on the Logical Foundations of Computer Science. Proc. Symp. on the Logical Foundations of Computer Science, Lecture Notes in Computer Science, Vol. 813 (1994), Springer: Springer Berlin), 89-100 · Zbl 0946.03046
[9] Gusfield, D., Efficient methods for multiple sequence alignment with guaranteed error bounds, Bull. Math. Biology, 55, 141-154 (1993) · Zbl 0756.92020
[10] Hirschberg, D. S., Recent results on the complexity of common subsequence problems, (Sankoff, D.; Kruskal, J. B., Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison (1983), Addison-Wesley: Addison-Wesley Reading, MA), 325-330
[11] Irving, R. W.; Fraser, C. B., Two algorithms for the longest common subsequence of three (or more) strings, (Apostolico, A.; Crochemore, M.; Galil, Z.; Manber, U., Proc. 3rd Ann. Symp. on Combinatorial Pattern Matching. Proc. 3rd Ann. Symp. on Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 644 (1992), Springer: Springer Berlin), 214-229
[12] Kruskal, J. B.; Sankoff, D., An anthology of algorithms and concepts for sequence comparison, (Sankoff, D.; Kruskal, J. B., Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison (1983), Addison-Wesley: Addison-Wesley Reading, MA), 265-310
[13] Lu, S. Y.; Fu, K. S., A sentence-to-sentence clustering procedure for pattern analysis, IEEE Trans. Systems Man and Cybernet., 8, 381-389 (1978) · Zbl 0378.68048
[14] Maier, D., The complexity of some problems on subsequences and supersequences, J. ACM, 25, 322-336 (1978) · Zbl 0371.68018
[15] Pearson, W. R.; Miller, W., Dynamic programming algorithms for biological sequence comparison, Methods in Enzymology, 183, 575-601 (1992)
[16] Pevzner, P. A., Multiple alignment, communication cost, and graph matching, SIAM J. Appl. Math., 52, 1763-1779 (1992) · Zbl 0766.68064
[17] Sankoff, D., Matching comparisons under deletion/insertion constraints, PNAS, 69, 4-6 (1972) · Zbl 0235.05013
[18] Sankoff, D., Simultaneous solution of the RNA folding, alignment, and protosequence problems, SIAM J. Appl. Math., 45, 810-825 (1985) · Zbl 0581.92012
[19] Wareham, H. T., On the computational complexity of inferring evolutionary trees, (Technical Report no. 9301. Technical Report no. 9301, M.Sc. Thesis (1993), Department of Computer Science, Memorial University of Newfoundland)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.