×

Correctness of substring-preprocessing in Boyer-Moore’s pattern matching algorithm. (English) Zbl 1018.68050

Summary: One of the main reasons for the high efficiency of the fast pattern matching algorithm of Boyer and Moore is preprocessing. The Boyer-Moore pattern matching algorithm utilizes two preprocessing algorithms: one on single characters and the other one on substrings. It is the latter which makes the pattern matching algorithm extremely fast (especially on natural language text). In the current paper we present a formal correctness proof of the program describing the substring-preprocessing algorithm. The proof is carried out within linear time temporal logic. During the process of our verification we found that indices of auxiliary arrays, as used in published high-level descriptions of the preprocessing algorithm, may run out of bounds. We demonstrate that this is the case and correct that undesirable aspect in the current paper.

MSC:

68Q65 Abstract data types; algebraic specification
Full Text: DOI

References:

[1] Aho, A. V., Algorithms for finding patterns in strings, (van Leeuwen, J., Handbook of Theoretical Computer Science (Volume A): Algorithms and Complexity, Vol. A (1990), Elsevier: Elsevier Amsterdam), 257-295
[2] Baase, S., Computer Algorithms — Introduction to Design and Analysis (1988), Addisson Wesley: Addisson Wesley Reading, MA
[3] Boyer, R. S.; Moore, J. S., A fast string searching algorithm, Comm. ACM, 20, 10, 762-772 (1977) · Zbl 1219.68165
[4] Boyer, R. S.; Moore, J. S., A verification condition generator for FORTRAN, (Boyer, R. S.; Moore, J. S., The Correctness Problem in Computer Science (1981), Academic Press: Academic Press New York, USA), 9-101
[5] Cormen, T. H.; Leiserson, C. E.; Rivest, R. L., Introduction to Algorithms (1990), McGraw-Hill: McGraw-Hill New York · Zbl 1158.68538
[6] Knuth, D. E.; Morris, J. H.; Pratt, V. R., Fast pattern matching in strings, SIAM J. Comput., 6, 2, 323-350 (1977) · Zbl 0372.68005
[7] Lamport, L., The “Hoare Logic” of concurrent programs, Acta Inform., 14, 21-37 (1980) · Zbl 0416.68032
[8] Manna, Z.; Pnueli, A., The Temporal Logic of Reactive and Concurrent Systems (1991), Springer: Springer Berlin · Zbl 0753.68003
[9] Mehlhorn, K., Data Structures and Algorithms 1: Sorting and Searching (1984), Springer: Springer Berlin · Zbl 0556.68001
[10] J.S. Moore, 1999, personal communication.; J.S. Moore, 1999, personal communication.
[11] Partsch, H. A.; Stomp, F. A., A fast pattern matching algorithm derived by transformational and assertional reasoning, Formal Aspects Comput., 2, 2, 109-122 (1990) · Zbl 0697.68025
[12] Rytter, W., A Correct preprocessing algorithm derived for Boyer-Moore string-searching, SIAM J. Comput., 9, 509-512 (1990) · Zbl 0446.68049
[13] Watson, B. W.; Zwaan, G., A taxonamy of sublinear multiple keyword pattern matching algorithms, Sci. Comput. Programming, 26, 85-118 (1996) · Zbl 0858.68026
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.