×

Computing similarity of run-length encoded strings with affine gap penalty. (English) Zbl 1142.68069

Summary: The problem of computing the similarity of two run-length encoded strings has been studied for various scoring metrics. Many algorithms have been developed for the longest common subsequence metric and some algorithms for the Levenshtein distance metric and the weighted edit distance metric. In this paper we consider similarity based on the affine gap penalty metric which is a more general and rather complicated scoring metric than the weighted edit distance. To compute the similarity in this model efficiently, we convert the problem into a path problem on a directed acyclic graph and use some properties of maximum paths in this graph. We present an \(O(nm^{\prime }+n^{\prime }m)\) time algorithm for computing the similarity of two run-length encoded strings in the affine gap penalty model, where \(n\) and \(m\) are the lengths of two given strings whose run-length encoded lengths are \(n^{\prime }\) and \(m^{\prime }\), respectively.

MSC:

68W05 Nonnumerical algorithms
Full Text: DOI

References:

[1] Apostolico, A.; Landau, G. M.; Skiena, S., Matching for run length encoded strings, Journal of Complexity, 15, 1, 4-16 (1999) · Zbl 0921.68041
[2] Arbell, O.; Landau, G. M.; Mitchell, J., Edit distance of run-length encoded strings, Information Processing Letters, 83, 6, 307-314 (2002) · Zbl 1043.68059
[3] Bunke, H.; Csirik, H., An improved algorithm for computing the edit distance of run length coded strings, Information Processing Letters, 54, 93-96 (1995) · Zbl 1004.68598
[4] Crochemore, M.; Landau, G. M.; Schieber, B.; Ziv-Ukelson, M., Re-use dynamic programming for sequence alignment: An algorithmic toolkit, (Iliopoulos, C. S.; Lecroq, T., String Algorithmics (2005), King’s College London Publications), 19-59
[5] Crochemore, M.; Landau, G. M.; Ziv-Ukelson, M., A subquadratic sequence alignment algorithm for unrestricted scoring matrices, SIAM Journal on Computing, 32, 6, 1654-1673 (2003) · Zbl 1253.74047
[6] Gajewska, H.; Tarjan, R. E., Deques with heap order, Information Processing Letters, 22, 197-200 (1986)
[7] Gotoh, O., An improved algorithm for matching biological sequences, Journal of Molecular Biology, 162, 705-708 (1982)
[8] Gusfield, D., Algorithms on Strings, Trees, and Sequences (1997), Cambridge University Press · Zbl 0934.68103
[9] Huang, X.; Miller, W., A time-efficient, linear-space local similarity algorithm, Advances in Applied Mathematics, 12, 337-357 (1991) · Zbl 0748.90079
[10] Kim, J. W.; Park, K., An efficient alignment algorithm for masked sequences, Theoretical Computer Science, 370, 19-33 (2007) · Zbl 1118.68054
[11] Mäkinen, V.; Navarro, G.; Ukkonen, E., Approximate matching of run-length compressed strings, (12th Annual Symposium on Combinatorial Pattern Matching. 12th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 2089 (2001)), 31-49 · Zbl 0990.68526
[12] Mäkinen, V.; Navarro, G.; Ukkonen, E., Approximate matching of run-length compressed strings, Algorithmica, 35, 347-369 (2003) · Zbl 1045.68059
[13] J. Mitchell, A geometric shortest path problem, with application to computing a longest common subsequence in run-length encoded strings, Technical Report, Dept. of Applied Mathematics, SUNY Stony Brook, 1997; J. Mitchell, A geometric shortest path problem, with application to computing a longest common subsequence in run-length encoded strings, Technical Report, Dept. of Applied Mathematics, SUNY Stony Brook, 1997
[14] Waterman, M. S.; Smith, T. F.; Beyer, W. A., Some biological sequence metrics, Advances in Mathematics, 20, 367-387 (1976) · Zbl 0342.92003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.