×

Exploiting word-level parallelism for fast convolutions and their applications in approximate string matching. (English) Zbl 1251.68306

Summary: We develop a method for performing convolutions efficiently in a word RAM model of computation, having a word size of \(w=\Omega (\log n)\) bits, where \(n\) is the input size. The basic idea is to pack several elements of the input vector into a single computer word, effectively enabling parallel computation of convolutions. The technique is applied to approximate string matching under the Hamming distance. The obtained algorithms are the fastest known. In particular, we reduce the complexity of the A. Amir, M. Lewenstein and E. Porat [in: Proceedings of the 11th annual ACM-SIAM symposium on discrete algorithms. Philadelphia, PA: SIAM. 794–803 (2000; Zbl 0957.68125)] algorithm for \(k\)-mismatches from \(O(n\sqrt{k\log k}\) to \(O(n+n\sqrt{k/w}\log k)\). Those algorithms impose some (not severe) limitation on the pattern length, \(m\). We present another, less efficient however, technique based on word-level parallelism, which works without the pattern length limitation.

MSC:

68W05 Nonnumerical algorithms
68Q05 Models of computation (Turing machines, etc.) (MSC2010)
68R05 Combinatorics in computer science

Citations:

Zbl 0957.68125
Full Text: DOI

References:

[1] Abrahamson, K., Generalized string matching, SIAM J. Comput., 16, 6, 1039-1051 (1987) · Zbl 0646.68079
[2] Amir, A.; Levy, A.; Reuveni, L., The practical efficiency of convolutions in pattern matching algorithms, Fund. Inform., 84, 1, 1-15 (2008) · Zbl 1167.68365
[4] Amir, A.; Lipsky, O.; Porat, E.; Umanski, J., Approximate matching in the \(L_1\) metric, (In Proceedings of the 16th Annual Symp. on Combinatorial Pattern Matching. In Proceedings of the 16th Annual Symp. on Combinatorial Pattern Matching, CPM. In Proceedings of the 16th Annual Symp. on Combinatorial Pattern Matching. In Proceedings of the 16th Annual Symp. on Combinatorial Pattern Matching, CPM, Lecture Notes in Computer Science, Vol. 3537 (2005), Springer-Verlag), 91-103 · Zbl 1131.68590
[5] Baeza-Yates, R. A.; Gonnet, G. H., A new approach to text searching, Commun. ACM, 35, 10, 74-82 (1992)
[6] Behrend, F. A., On sets of integers which contain no three in arithmetic progression, Proc. Natl. Acad. Sci., 23, 331-332 (1946) · Zbl 0060.10302
[7] Clifford, P.; Clifford, R.; Iliopoulos, C. S., Faster algorithms for \(\delta, \gamma \)-matching and related problems, (In Proceedings of the 16th Annual Symp. on Combinatorial Pattern Matching. In Proceedings of the 16th Annual Symp. on Combinatorial Pattern Matching, (CPM). In Proceedings of the 16th Annual Symp. on Combinatorial Pattern Matching. In Proceedings of the 16th Annual Symp. on Combinatorial Pattern Matching, (CPM), Lecture Notes in Computer Science, Vol. 3537 (2005), Springer-Verlag), 68-78 · Zbl 1131.68583
[10] Fredman, M. L.; Willard, D. E., Surpassing the information theoretic bound with fusion trees, J. Comput. System Sci., 47, 424-436 (1993) · Zbl 0795.68049
[11] Fredriksson, K.; Grabowski, Sz., Fast convolutions and their applications in approximate string matching, (In Proceedings of the 20th International Workshop on Combinatorial Algorithms. In Proceedings of the 20th International Workshop on Combinatorial Algorithms, IWOCA. In Proceedings of the 20th International Workshop on Combinatorial Algorithms. In Proceedings of the 20th International Workshop on Combinatorial Algorithms, IWOCA, Lecture Notes in Computer Science, Vol. 5874 (2009), Springer), 254-265 · Zbl 1267.68327
[12] Galil, Z.; Giancarlo, R., Improved string matching with \(k\) mismatches, SIGACT News, 17, 4, 52-54 (1986) · Zbl 0608.68058
[13] Galil, Z.; Giancarlo, R., Data structures and algorithms for approximate string matching, J. Complexity, 4, 1, 33-72 (1988) · Zbl 0646.68078
[14] Gasarch, W.; Glenn, J.; Kruskal, C., Finding large 3-free sets I: the small \(n\) case, J. Comput. System Sci., 74, 4, 628-655 (2008) · Zbl 1144.11010
[15] Grabowski, Sz.; Fredriksson, K., Bit-parallel string matching under Hamming distance in \(O(n \lceil m / w \rceil)\) worst case time, Inform. Process. Lett., 105, 5, 182-187 (2008) · Zbl 1184.68209
[16] Graham, R. L.; Rothschild, B. L., Ramsey Theory (1990), Wiley-Interscience: Wiley-Interscience New York, NY, USA · Zbl 0705.05061
[17] Gusfield, D., Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology (1997), Cambridge University Press: Cambridge University Press Cambridge · Zbl 0934.68103
[18] Indyk, P., Faster algorithms for string matching problems: matching the convolution bound, (In Proceedings of the 39th IEEE Annual Symposium on Foundations of Computer Science (1998), IEEE Computer Society Press), 166-173
[19] Knuth, D., (The Art of Computer Programming. The Art of Computer Programming, Combinatorial algorithms Part 1, Vol. 4A (2011), Addison-Wesley) · Zbl 0191.17903
[21] Landau, G. M.; Vishkin, U., Efficient string matching with \(k\) mismatches, Theoret. Comput. Sci., 43, 2-3, 239-249 (1986) · Zbl 0597.68055
[22] Linhart, C.; Shamir, R., Faster pattern matching with character classes using prime number encoding, J. Comput. System Sci., 75, 3, 155-162 (2009) · Zbl 1169.68044
[23] Roth, K. F., On certain sets of integers, J. Lond. Math. Soc., 28, 104-109 (1953) · Zbl 0050.04002
[24] Sanders, T., On Roth’s theorem on progressions, Ann. of Math., 174, 619-636 (2011) · Zbl 1264.11004
[25] Smith, J. O., III. Mathematics of the Discrete Fourier Transform (DFT), with Audio Applications (2007), W3K Publishing
[26] Thorup, M., Combinatorial power in multimedia processors, SIGARCH Comput. Archit. News, 31, 4, 5-11 (2003)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.