×

Towards optimal packed string matching. (English) Zbl 1282.68184

Summary: In the packed string matching problem, it is assumed that each machine word can accommodate up to \(\alpha\) characters, thus an \(n\)-character string occupies \(n/\alpha\) memory words.
(a) We extend the Crochemore-Perrin constant-space \(O(n)\)-time string-matching algorithm to run in optimal \(O(n/\alpha)\) time and even in real-time, achieving a factor \(\alpha\) speedup over traditional algorithms that examine each character individually. Our macro-level algorithm only uses the standard \(\mathrm{AC}^0\) instructions of the word-RAM model (i.e. no integer multiplication) plus two specialized micro-level \(\mathrm{AC}^0\) word-size packed-string instructions. The main word-size string-matching instruction wssm is available in contemporary commodity processors. The other word-size maximum-suffix instruction wslm is only required during the pattern pre-processing. Benchmarks show that our solution can be efficiently implemented, unlike some prior theoretical packed string matching work.
(b) We also consider the complexity of the packed string matching problem in the classical word-RAM model in the absence of the specialized micro-level instructions wssm and wslm. We propose micro-level algorithms for the theoretically efficient emulation using parallel algorithms techniques to emulate wssm and using the Four-Russians technique to emulate wslm. Surprisingly, our bit-parallel emulation of wssm also leads to a new simplified parallel random access machine string-matching algorithm. As a byproduct to facilitate our results we develop a new algorithm for finding the leftmost (most significant) 1 bits in consecutive non-overlapping blocks of uniform size inside a word. This latter problem is not known to be reducible to finding the rightmost 1, which can be easily solved, since we do not know how to reverse the bits of a word in \(O(1)\) time.

MSC:

68R15 Combinatorics on words
68W32 Algorithms on strings
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)

References:

[1] Aho, A.; Corasick, M., Efficient string matching: An aid to bibliographic search, Commun. ACM, 18, 6, 333-340 (1975) · Zbl 0301.68048
[2] Aho, A.; Hopcroft, J.; Ullman, J., The Design and Analysis of Computer Algorithms (1974), Addison-Wesley: Addison-Wesley Reading, MA · Zbl 0326.68005
[3] AMD, AMD64® Architecture Programmer’s Manual Volume 4: 128-Bit and 256-Bit Media Instructions (2011), AMD Corporation
[4] AMD, Software Optimization Guide for AMD Family 15h Processors (2011), AMD Corporation
[5] Apostolico, A.; Crochemore, M., Optimal canonization of all substrings of a string, Inform. and Comput., 95, 1, 76-95 (1991) · Zbl 0757.68060
[6] Arlazarov, V.; Dinic, E.; Kronrod, M.; Faradzev, I., On economic construction of the transitive closure of a directed graph, Soviet Math. Dokl., 11, 1209-1210 (1970) · Zbl 0214.23601
[7] Baeza-Yates, R. A., Improved string searching, Softw. Pract. Exp., 19, 3, 257-271 (1989)
[8] Belazzougui, D., Worst case efficient single and multiple string matching in the RAM model, (Proceedings of the 21st International Workshop on Combinatorial Algorithms. Proceedings of the 21st International Workshop on Combinatorial Algorithms, IWOCA (2010)), 90-102 · Zbl 1326.68371
[9] Ben-Kiki, O.; Bille, P.; Breslauer, D.; Gąsieniec, L.; Grossi, R.; Weimann, O., Optimal packed string matching, (Chakraborty, S.; Kumar, A., FSTTCS. FSTTCS, LIPIcs, vol. 13 (2011), Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik), 423-432 · Zbl 1246.68272
[10] Ben-Nissan, M.; Klein, S. T., Accelerating Boyer Moore searches on binary texts, (Proceedings of the 12th International Conference on Implementation and Application of Automata. Proceedings of the 12th International Conference on Implementation and Application of Automata, CIAA (2007)), 130-143 · Zbl 1139.68367
[11] Bille, P., Fast searching in packed strings, J. Discrete Algorithms, 9, 1, 49-56 (2011) · Zbl 1216.68121
[12] Boyer, R.; Moore, J., A fast string searching algorithm, Commun. ACM, 20, 762-772 (1977) · Zbl 1219.68165
[13] Breslauer, D.; Czumaj, A.; Dubhashi, D. P.; Meyer auf der Heide, F., Comparison model lower bounds to the parallel-random-access-machine, Inform. Process. Lett., 62, 2, 103-110 (1997) · Zbl 1337.68113
[14] Breslauer, D.; Galil, Z., An optimal \(O(\log \log n)\) time parallel string matching algorithm, SIAM J. Comput., 19, 6, 1051-1058 (1990) · Zbl 0711.68057
[15] Breslauer, D.; Galil, Z., A lower bound for parallel string matching, SIAM J. Comput., 21, 5, 856-862 (1992) · Zbl 0756.68048
[16] Breslauer, D.; Gąsieniec, L.; Grossi, R., Constant-time word-size packed string-matching, (CPM (2012)), 83-96 · Zbl 1358.68333
[17] Breslauer, D.; Grossi, R.; Mignosi, F., Simple real-time constant-space string matching, (Giancarlo, R.; Manzini, G., CPM. CPM, Lecture Notes in Comput. Sci., vol. 6661 (2011), Springer), 173-183 · Zbl 1339.68325
[18] Brodnik, A.; Miltersen, P.; Munro, I., Trans-dichotomous algorithms without multiplication - some upper and lower bounds, (Proc. 6th Workshop on Algorithms and Data Structures, no. 955 (1997)), 426-439 · Zbl 1497.68553
[19] Césari, Y.; Vincent, M., Une caractérisation des mots périodiques, C. R. Acad. Sci. Paris, 286(A), 1175-1177 (1978) · Zbl 0392.20039
[20] Cole, R.; Crochemore, M.; Galil, Z.; Gąsieniec, L.; Hariharan, R.; Muthukrishnan, S.; Park, K.; Rytter, W., Optimally fast parallel algorithms for preprocessing and pattern matching in one and two dimensions, (Proc. 34th IEEE Symp. on Foundations of Computer Science (1993)), 248-258
[21] Cole, R.; Hariharan, R., Approximate string matching: A simpler faster algorithm, SIAM J. Comput., 31, 6, 1761-1782 (2002) · Zbl 1008.68165
[22] Commentz-Walter, B., A string matching algorithm fast on the average, (Proc. 6th International Colloquium on Automata, Languages, and Programming. Proc. 6th International Colloquium on Automata, Languages, and Programming, Lecture Notes in Comput. Sci. (1979), Springer-Verlag: Springer-Verlag Berlin, Germany), 118-132 · Zbl 0407.68092
[23] Crochemore, M.; Galil, Z.; Gąsieniec, L.; Park, K.; Rytter, W., Constant-time randomized parallel string matching, SIAM J. Comput., 26, 4, 950-960 (1997) · Zbl 0885.68078
[24] Crochemore, M.; Perrin, D., Two-way string-matching, J. ACM, 38, 3, 651-675 (1991) · Zbl 0808.68063
[25] Crochemore, M.; Rytter, W., Text Algorithms (1994), Oxford University Press · Zbl 0844.68101
[26] Czumaj, A.; Galil, Z.; Gąsieniec, L.; Park, K.; Plandowski, W., Work-time-optimal parallel algorithms for string problems, (Leighton, F. T.; Borodin, A., STOC (1995), ACM), 713-722 · Zbl 0978.68531
[27] Daykin, J. W.; Iliopoulos, C. S.; Smyth, W. F., Parallel RAM algorithms for factorizing words, Theoret. Comput. Sci., 127, 1, 53-67 (1994) · Zbl 0805.68057
[28] Duval, J., Factorizing words over an ordered alphabet, J. Algorithms, 4, 363-381 (1983) · Zbl 0532.68061
[29] Faro, S.; Lecroq, T., Efficient pattern matching on binary strings, (Proceedings of the 35th Conference on Current Trends in Theory and Practice of Computer Science. Proceedings of the 35th Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM (2009))
[30] Faro, S.; Lecroq, T., The exact string matching problem: a comprehensive experimental evaluation (2011), Cornell University Library, Technical report
[31] Faro, S.; Lecroq, T., The exact online string matching problem: A review of the most recent results, ACM Comput. Surv., 45, 2, 13 (2013) · Zbl 1293.68314
[32] Fich, F. E., Constant time operations for words of length w (1999), University of Toronto, Technical report
[33] Fine, N.; Wilf, H., Uniqueness theorems for periodic functions, Proc. Amer. Math. Soc., 16, 109-114 (1965) · Zbl 0131.30203
[34] Fischer, M.; Paterson, M., String matching and other products, (Complexity of Computation (1974), American Mathematical Society), 113-125 · Zbl 0301.68027
[35] Fredriksson, K., Faster string matching with super-alphabets, (Proceedings of the 9th International Symposium on String Processing and Information Retrieval. Proceedings of the 9th International Symposium on String Processing and Information Retrieval, SPIRE (2002)), 44-57
[36] Fredriksson, K., Shift-or string matching with super-alphabets, Inform. Process. Lett., 87, 4, 201-204 (2003) · Zbl 1161.68409
[37] Furst, M. L.; Saxe, J. B.; Sipser, M., Parity, circuits, and the polynomial-time hierarchy, Math. Syst. Theory, 17, 1, 13-27 (1984) · Zbl 0534.94008
[38] Galil, Z., Optimal parallel algorithms for string matching, Inform. Control, 67, 144-157 (1985) · Zbl 0588.68022
[39] Galil, Z., A constant-time optimal parallel string-matching algorithm, J. ACM, 42, 4, 908-918 (1995) · Zbl 0885.68082
[40] Gąsieniec, L.; Plandowski, W.; Rytter, W., Constant-space string matching with smaller number of comparisons: Sequential sampling, (Proc. 6th Symp. on Combinatorial Pattern Matching. Proc. 6th Symp. on Combinatorial Pattern Matching, Lecture Notes in Comput. Sci., vol. 937 (1995), Springer-Verlag: Springer-Verlag Berlin, Germany), 78-89
[41] Goldberg, T.; Zwick, U., Faster parallel string matching via larger deterministic samples, J. Algorithms, 16, 2, 295-308 (1994) · Zbl 0797.68083
[42] Gusfield, D., Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology (1997), Cambridge University Press · Zbl 0934.68103
[43] Iliopoulos, C. S.; Smyth, W. F., Optimal algorithms for computing the canonical form of a circular string, Theoret. Comput. Sci., 92, 1, 87-105 (1992) · Zbl 0747.68029
[44] Intel, Intel® SSE4 Programming Reference (2007), Intel Corporation
[45] Intel, Intel® 64 and IA-32 Architectures Optimization Reference Manual (2011), Intel Corporation
[46] Intel, Intel® Advanced Vector Extensions Programming Reference (2011), Intel Corporation
[47] Knuth, D.; Morris, J.; Pratt, V., Fast pattern matching in strings, SIAM J. Comput., 6, 322-350 (1977) · Zbl 0372.68005
[48] Knuth, D. E., Combinatorial Algorithms, The Art of Computer Programming, vol. 4A (Jan. 2011), Addison-Wesley Professional · Zbl 1354.68001
[49] Lothaire, M., Combinatorics on Words (1983), Addison-Wesley: Addison-Wesley Reading, MA, USA · Zbl 0514.20045
[50] Muthukrishnan, S., New results and open problems related to non-standard stringology, (CPM (1995)), 298-317
[51] Muthukrishnan, S.; Palem, K. V., Non-standard stringology: Algorithms and complexity, (Proceedings of the 26th Annual ACM Symposium on Theory of Computing. Proceedings of the 26th Annual ACM Symposium on Theory of Computing, STOC (1994)), 770-779 · Zbl 1345.68305
[52] Muthukrishnan, S.; Ramesh, H., String matching under a general matching relation, Inform. and Comput., 122, 1, 140-148 (1995) · Zbl 0834.68094
[53] Navarro, G.; Raffinot, M., A bit-parallel approach to suffix automata: Fast extended string matching, (Farach-Colton, M., CPM. CPM, Lecture Notes in Comput. Sci., vol. 1448 (1998), Springer), 14-33
[54] Rytter, W., On maximal suffixes, constant-space linear-time versions of KMP algorithm, Theoret. Comput. Sci., 299, 1-3, 763-774 (2003) · Zbl 1051.68051
[55] Tarhio, J.; Peltola, H., String matching in the DNA alphabet, Softw. Pract. Exp., 27, 851-861 (1997)
[56] Vishkin, U., Optimal parallel pattern matching in strings, Inform. Control, 67, 91-113 (1985) · Zbl 0588.68023
[57] Vishkin, U., Deterministic sampling - a new technique for fast pattern matching, SIAM J. Comput., 20, 1, 22-40 (1990) · Zbl 0716.68074
[58] Yao, A. C.-C., The complexity of pattern matching for a random string, SIAM J. Comput., 8, 3, 368-387 (1979) · Zbl 0421.68045
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.