×

Simon’s congruence pattern matching. (English) Zbl 1535.81075

Summary: The Simon’s congruence problem is to determine whether or not two strings have the same set of subsequences of length no greater than a given integer, and the problem can be answered in linear time. We consider the Simon’s congruence pattern matching problem that looks for all substrings of a text that are congruent to a pattern under the Simon’s congruence. We propose a linear time algorithm by reusing results from previous computations with the help of new data structures called X-trees and Y-trees. Moreover, we investigate several variants of the problem such as identifying the shortest substring or subsequence of the text that is congruent to the pattern under the Simon’s congruence, or finding frequent matchings. We design efficient algorithms for these problems. We conclude the paper with two open problems: finding the longest congruent subsequence and optimizing the pattern matching problem.

MSC:

81P68 Quantum computation
68T10 Pattern recognition, speech recognition
68W32 Algorithms on strings
68P05 Data structures
74K05 Strings
28A75 Length, area, volume, other geometric measure theory
26E70 Real analysis on time scales or measure chains
05C05 Trees
Full Text: DOI

References:

[1] Adamson, D.; Kosche, M.; Koß, T.; Manea, F.; Siemer, S., Longest common subsequence with gap constraints, (Proceedings of the \(14^{\text{th}}\) International Conference on Combinatorics on Words, Lecture Notes in Computer Science, vol. 13899, 2023, Springer), 60-76 · Zbl 07716981
[2] Agrawal, A.; Gawrychowski, P., A faster subquadratic algorithm for the longest common increasing subsequence problem, (Proceedings of the \(31^{\text{st}}\) International Symposium on Algorithms and Computation, LIPIcs, vol. 181, 2020), 4:1-4:12 · Zbl 07765362
[3] Barker, L.; Fleischmann, P.; Harwardt, K.; Manea, F.; Nowotka, D., Scattered factor-universality of words, (Proceedings of the \(24^{\text{th}}\) International Conference on Developments in Language Theory, Lecture Notes in Computer Science, vol. 12086, 2020), 14-28 · Zbl 07601058
[4] Beal, R.; Afrin, T.; Farheen, A.; Adjeroh, D., A new algorithm for “the LCS problem” with application in compressing genome resequencing data, BMC Genomics, 17, Supplement 4(544), 369-381, 2016
[5] Chan, W.-T.; Zhang, Y.; Fung, S. P.Y.; Ye, D.; Zhu, H., Efficient algorithms for finding a longest common increasing subsequence, J. Comb. Optim., 13, 3, 277-288, 2007 · Zbl 1123.68135
[6] Day, J. D.; Kosche, M.; Manea, F.; Schmid, M. L., Subsequences with gap constraints: complexity bounds for matching and analysis problems, (Proceedings of the \(33^{\text{rd}}\) International Symposium on Algorithms and Computation, LIPIcs, vol. 248, 2022, Schloss Dagstuhl - Leibniz-Zentrum für Informatik), 64:1-64:18 · Zbl 07911130
[7] Farach, M., Optimal suffix tree construction with large alphabets, (Proceedings of the \(38^{\text{th}}\) Annual Symposium on Foundations of Computer Science, 1997), 137-143
[8] Fleischer, L.; Kufleitner, M., Testing Simon’s congruence, (Proceedings of the \(43^{\text{rd}}\) International Symposium on Mathematical Foundations of Computer Science, 2018), 62:1-62:13 · Zbl 1512.68139
[9] Fleischmann, P.; Haschke, L.; Huch, A.; Mayrock, A.; Nowotka, D., Nearly k-universal words - investigating a part of Simon’s congruence, (Proceedings of the \(24^{\text{th}}\) International Conference on Descriptional Complexity of Formal Systems, Lecture Notes in Computer Science, vol. 13439, 2022, Springer), 57-71 · Zbl 07643463
[10] Fleischmann, P.; Höfer, J.; Huch, A.; Nowotka, D., α-β-factorization and the binary case of Simon’s congruence, (Proceedings of the \(24^{\text{th}}\) International Symposium on Fundamentals of Computation Theory, Lecture Notes in Computer Science, vol. 14292, 2023, Springer), 190-204 · Zbl 07856019
[11] Fleischmann, P.; Kim, S.; Koß, T.; Manea, F.; Nowotka, D.; Siemer, S.; Wiedenhöft, M., Matching patterns with variables under Simon’s congruence, (Proceedings of the \(17^{\text{th}}\) International Conference on Reachability Problems, Lecture Notes in Computer Science, vol. 14235, 2023, Springer), 155-170
[12] Garel, E., Minimal separators of two words, (Proceedings of the \(4^{\text{th}}\) Annual Symposium on Combinatorial Pattern Matching, 1993), 35-53
[13] Gawrychowski, P.; Kosche, M.; Koß, T.; Manea, F.; Siemer, S., Efficiently testing Simon’s congruence, (Proceedings of the \(38^{\text{th}}\) International Symposium on Theoretical Aspects of Computer Science, LIPIcs, vol. 187, 2021), 34:1-34:18
[14] Hébrard, J.-J., An algorithm for distinguishing efficiently bit-strings by their subsequences, Theor. Comput. Sci., 82, 1, 35-49, 1991 · Zbl 0724.68044
[15] Hunt, J. W.; McIlroy, M. D., An algorithm for differential file comparison, Comput. Sci. Techn. Rep., 41, 1975
[16] Kim, S.; Han, Y.-S.; Ko, S.-K.; Salomaa, K., On Simon’s congruence closure of a string, Theor. Comput. Sci., 972, Article 114078 pp., 2023 · Zbl 1520.68059
[17] Kim, S.; Han, Y.-S.; Ko, S.-K.; Salomaa, K., On the Simon’s congruence neighborhood of languages, (Proceedings of the \(27^{\text{th}}\) International Conference on Developments in Language Theory, Lecture Notes in Computer Science, vol. 13911, 2023, Springer), 168-181 · Zbl 07766942
[18] Kim, S.; Ko, S.-K.; Han, Y.-S., Simon’s congruence pattern matching, (Proceedings of the \(33^{\text{rd}}\) International Symposium on Algorithms and Computation, LIPIcs, vol. 248, 2022, Schloss Dagstuhl - Leibniz-Zentrum für Informatik), 60:1-60:17 · Zbl 07911126
[19] Schnoebelen, P.; Veron, J., On arch factorization and subword universality for words and compressed words, (Proceedings of the \(14^{\text{th}}\) International Conference on Combinatorics on Words, Lecture Notes in Computer Science, vol. 13899, 2023, Springer), 274-287 · Zbl 07716997
[20] Schwentick, T.; Thérien, D.; Vollmer, H., Partially-ordered two-way automata: a new characterization of DA, (Revised Papers from the \(5^{\text{th}}\) International Conference on Developments in Language Theory, 2001), 239-250 · Zbl 1073.68051
[21] Sedmidubský, J.; Zezula, P., A web application for subsequence matching in 3d human motion data, (Proceedings of the \(19^{\text{th}}\) IEEE International Symposium on Multimedia, 2017), 372-373
[22] Simon, I., Piecewise testable events, (Proceedings of the \(2^{\text{nd}}\) GI Conference on Automata Theory and Formal Languages, 1975), 214-222 · Zbl 0316.68034
[23] Surynková, P.; Surynek, P., Application of longest common subsequence algorithms to meshing of planar domains with quadrilaterals, (Proceedings of the \(9^{\text{th}}\) International Conference on Mathematical Methods for Curves and Surfaces, Lecture Notes in Computer Science, vol. 10521, 2016), 296-311 · Zbl 1422.65052
[24] Weis, P.; Immerman, N., Structure theorem and strict alternation hierarchy for FO^2 on words, Log. Methods Comput. Sci., 5, 3, 2009 · Zbl 1168.03019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.