Document Zbl 1045.68059

Mäkinen, Veli; Ukkonen, Esko; Navarro, Gonzalo

Approximate matching of run-length compressed strings. (English) Zbl 1045.68059

Algorithmica 35, No. 4, 347-369 (2003).

Summary: We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length \(m\) and \(n\), compressed to \(m'\) and \(n'\) runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving \(O(m'n+n'm)\) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of \(m\) letters (\(m\)’ runs) in a text of n letters (\(n'\) runs) in \(O(mm'n')\) time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has \(O(m'n')\) expected case complexity. Experimental results are provided to support the conjecture.

Cited in 10 Documents

MSC:

68P30	Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
68P10	Searching and sorting

Keywords:

Compressed pattern matching; Run-length encoding; Levenshtein distance; Longest common subsequence; Weighted edit distance

Cite Review PDF

Full Text: DOI