×

An acceleration of FFT-based algorithms for the match-count problem. (English) Zbl 1409.68349

Summary: The match-count problem on strings is a problem of counting the matches of characters for every possible gap of the starting positions between two strings. This problem for strings of lengths \(m\) and \(n\) \((m\leq n)\) over an alphabet of size \(\sigma\) is classically solved in \(O(\sigma n\log m)\) time using the algorithm based on the convolution theorem and a fast Fourier transform (FFT). This paper provides a method to reduce the number of computations of the FFT required in the FFT-based algorithm. The algorithm obtained by the proposed method still needs \(O(\sigma n\log m)\) time, but the number of required FFT computations is reduced from 3\(\sigma\) to \(2\sigma+1\). This practical improvement of the processing time is also applicable to other algorithms based on the convolution theorem, including algorithms for the weighted version of the match-count problem.

MSC:

68W32 Algorithms on strings
65T50 Numerical methods for discrete and fast Fourier transforms
68W40 Analysis of algorithms

Software:

word2vec
Full Text: DOI

References:

[1] Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology (1997), Cambridge University Press: Cambridge University Press New York, NY, USA · Zbl 0934.68103
[2] Cormen, T. H.; Stein, C.; Rivest, R. L.; Leiserson, C. E., Introduction to Algorithms (2001), McGraw-Hill Higher Education · Zbl 1047.68161
[3] Fischer, M. J.; Paterson, M. S., String-matching and other products, (Complexity of Computation, Proceedings of the SIAM-AMS Applied Mathematics Symposium. Complexity of Computation, Proceedings of the SIAM-AMS Applied Mathematics Symposium, New York, 1973 (1974)), 113-125 · Zbl 0301.68027
[4] Atallah, M. J.; Chyzak, F.; Dumas, P., A randomized algorithm for approximate string matching, Algorithmica, 29, 3, 468-486 (2001) · Zbl 0969.68192
[5] Baba, K.; Shinohara, A.; Takeda, M.; Inenaga, S.; Arikawa, S., A note on randomized algorithm for string matching with mismatches, Nord. J. Comput., 10, 1, 2-12 (2003) · Zbl 1065.68113
[6] Schoenmeyr, T.; Zhang, D. Y., FFT-based algorithms for the string matching with mismatches problem, J. Algorithms, 57, 2, 130-139 (2005) · Zbl 1105.68117
[7] Baba, K., String matching with mismatches by real-valued FFT, (Taniar, D.; Gervasi, O.; Murgante, B.; Pardede, E.; Apduhan, B. O., Proceedings, Part IV, Computational Science and Its Applications - ICCSA 2010: International Conference. Proceedings, Part IV, Computational Science and Its Applications - ICCSA 2010: International Conference, Fukuoka, Japan, March 23-26, 2010 (2010), Springer: Springer Berlin, Heidelberg), 273-283
[8] Atallah, M. J.; Grigorescu, E.; Wu, Y., A lower-variance randomized algorithm for approximate string matching, Inf. Process. Lett., 113, 18, 690-692 (2013) · Zbl 1284.68646
[9] Abrahamson, K., Generalized string matching, SIAM J. Comput., 16, 6, 1039-1051 (1987) · Zbl 0646.68079
[10] Fredriksson, K.; Grabowski, S., Exploiting word-level parallelism for fast convolutions and their applications in approximate string matching, Combinatorics and Stringology. Combinatorics and Stringology, Eur. J. Comb., 34, 1, 38-51 (2013) · Zbl 1251.68306
[11] Indyk, P.; Motwani, R., Approximate nearest neighbors: towards removing the curse of dimensionality, (Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98 (1998), ACM: ACM New York, NY, USA), 604-613 · Zbl 1029.68541
[12] Landauer, T. K.; Foltz, P. W.; Laham, D., An introduction to latent semantic analysis, Discourse Process., 25, 2-3, 259-284 (1998)
[13] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; Dean, J., Distributed representations of words and phrases and their compositionality, (Burges, C. J.C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K. Q., Advances in Neural Information Processing Systems, vol. 26 (2013), Curran Associates, Inc.), 3111-3119
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.