×

On approximate pattern matching with thresholds. (English) Zbl 1405.68467

Summary: In the traditional version of the problem of approximate pattern matching, a pattern symbol is considered to match a text symbol if the two symbols are equal. Such a notion of exact equality is not suitable for situations where the text and pattern symbols are imprecise, e.g., obtained from an analog source, distorted by additive noise, etc. In such situations it is more appropriate to consider two alphabet symbols to match even if they are not equal, as long as they do not differ by more than a given threshold \(\theta\). The goal is then to compute the number of matches of the length-\(M\) pattern with all length-\(M\) substrings of the length-\(N\) text, i.e., to compute a vector of \(N-M+1\) scores, where the \(i\)th score is the number of matches between the pattern and the substring that begins at text position \(i\). The main result of this paper is to show that this threshold version of the problem can be solved by recursively solving \(3+2\log\theta\) instances of the traditional (i.e., zero-threshold) version of the problem, which is much-studied in the literature and for which there are many efficient (typically randomized) solutions of time complexity close to \(O(N \log M)\). This paper’s result therefore implies the first randomized \(O(N\log M(\log\theta+1))\) solution for the threshold version of the problem. It also implies that any future improvement to the traditional (zero-threshold) version of the problem automatically translates into a similar improvement to the arbitrary-threshold case. Furthermore, we show that the factor \(\Omega(\log\theta)\) is tight if use our recursive framework.

MSC:

68W32 Algorithms on strings
68W20 Randomized algorithms
68W40 Analysis of algorithms
Full Text: DOI

References:

[1] Abrahamson, Karl, Generalized string matching, SIAM J. Comput., 16, 1039-1051 (1987) · Zbl 0646.68079
[2] Amir, Amihood; Lewenstein, Moshe; Porat, Ely, Faster algorithms for string matching with k mismatches, (Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms. Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco (January 2000)), 794-803 · Zbl 0957.68125
[3] Atallah, Mikhail J.; Chyzak, Frederic; Dumas, Philippe, A randomized algorithm for approximate string matching, Algorithmica, 29, 3, 468-486 (2001) · Zbl 0969.68192
[4] Atallah, Mikhail J.; Duket, Timothy W., Pattern matching in the Hamming distance with thresholds, Inf. Process. Lett., 111, 14, 674-677 (2011) · Zbl 1260.68487
[5] Atallah, Mikhail J.; Grigorescu, Elena; Wu, Yi, A lower-variance randomized algorithm for approximate string matching, Inf. Process. Lett., 113, 18, 690-692 (2013) · Zbl 1284.68646
[6] Baeza-Yates, Ricardo A.; Gonnet, Gaston H., A new approach to text searching, Commun. ACM, 35, 10, 74-82 (1992)
[7] Cantone, Domenico; Cristofaro, Salvatore; Faro, Simone, Efficient algorithms for the delta-approximate string matching problem in musical sequences, (Stringology (2004)), 33-47
[8] Clifford, Peter; Clifford, Raphae̋l; Iliopoulos, Costas, Faster algorithms for \(\delta, \gamma \)-matching and related problems, (Combinatorial Pattern Matching (2005)), 68-78 · Zbl 1131.68583
[9] Cole, Richard; Iliopoulos, Costas; Lecroq, Thierry; Plandowski, Wojciech; Rytter, Wojciech, On special families of morphisms related to \(δ\)-matching and don’t care symbols, Inf. Process. Lett., 85, 5, 227-233 (2003) · Zbl 1173.68493
[10] Crochemore, Maxime; Iliopoulos, Costas S.; Lecroq, Thierry; Pinzon, Yoan J.; Plandowski, Wojciech; Rytter, Wojciech, Occurrence and substring heuristics for \(δ\)-matching, Fundam. Inform., 56, 1, 1-21 (2003) · Zbl 1046.68096
[11] Frederiksson, Kimmo; Grabowski, Szymon, Efficient algorithms for \(( \delta, \gamma, \alpha )\) and \(( \delta, k_\Delta, \alpha )\)-matching, Int. J. Found. Comput. Sci., 19, 1, 163-183 (2008) · Zbl 1169.68651
[12] Fredriksson, Kimmo; Grabowski, Szymon, Exploiting word-level parallelism for fast convolutions and their applications in approximate string matching, Eur. J. Comb., 34, 1, 38-51 (2013) · Zbl 1251.68306
[13] Galil, Zvi; Giancarlo, Raffaele, Improved string matching with k mismatches, SIGACT News, 17, 4, 52-54 (1986) · Zbl 0608.68058
[14] Grabowski, Szymon; Fredriksson, Kimmo, Bit-parallel string matching under Hamming distance in \(O(n \lceil m / w \rceil)\) worst-case time, Inf. Process. Lett., 105, 5, 182-187 (2008) · Zbl 1184.68209
[15] Karloff, Howard, Fast algorithms for approximately counting mismatches, Inf. Process. Lett., 48, 2, 53-60 (1993) · Zbl 0788.68062
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.