×

Structural pattern matching – succinctly. (English) Zbl 1457.68335

Okamoto, Yoshio (ed.) et al., 28th international symposium on algorithms and computation, ISAAC 2017, December 9–12, 2017, Phuket, Thailand. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik. LIPIcs – Leibniz Int. Proc. Inform. 92, Article 35, 13 p. (2017).
Summary: Let \(T\) be a text of length \(n\) containing characters from an alphabet \(\Sigma\), which is the union of two disjoint sets: \(\Sigma_s\) containing static characters \((s\)-characters) and \(\Sigma_p\) containing parameterized characters \((p\)-characters). Each character in \(\Sigma_p\) has an associated complementary character from \(\Sigma_p\). A pattern \(P\) (also over \(\Sigma)\) matches an equal-length substring \(S\) of \(T\) iff the \(s\)-characters match exactly, there exists a one-to-one function that renames the \(p\)-characters in \(S\) to the \(p\)-characters in \(P\), and if a \(p\)-character \(x\) is renamed to another \(p\)-character \(y\) then the complement of \(x\) is renamed to the complement of \(y\). The task is to find the starting positions (occurrences) of all such substrings \(S\). Previous indexing solution T. Shibuya [Lect. Notes Comput. Sci. 1851, 393–406 (2000; Zbl 0966.68508)], known as Structural Suffix Tree, requires \(\Theta(n\log n)\) bits of space, and can find all occ occurrences in time \(O(|P|\log\sigma+\mathrm{occ})\), where \(\sigma=|\Sigma|\). In this paper, we present the first succinct index for this problem, which occupies \(n\log\sigma+O(n)\) bits and offers \(O(|P|\log\sigma+\mathrm{occ}\cdot\log n\log\sigma)\) query time.
For the entire collection see [Zbl 1376.68013].

MSC:

68W32 Algorithms on strings
68P05 Data structures

Citations:

Zbl 0966.68508
Full Text: DOI

References:

[1] Brenda S. Baker. A theory of parameterized pattern matching: algorithms and applications. In S. Rao Kosaraju, David S. Johnson, and Alok Aggarwal, editors, {\it Proceedings of the} {\it Twenty-Fifth Annual ACM Symposium on Theory of Computing, May 16-18, 1993, San} {\it Diego, CA, USA}, pages 71-80. ACM, 1993. doi:10.1145/167088.167115. · Zbl 1310.68098
[2] Djamal Belazzougui and Gonzalo Navarro. Alphabet-independent compressed text index ing. {\it ACM Trans. Algorithms}, 10(4):23:1-23:19, 2014. doi:10.1145/2635816. · Zbl 1325.68307
[3] M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Tech nical report, Digital Equipment Corporation (now part of Hewlett-Packard, Palo Alto, CA), 1994.
[4] Richard Cole and Ramesh Hariharan. Faster suffix tree construction with missing suffix links. {\it SIAM J. Comput.}, 33(1):26-42, 2003. doi:10.1137/S0097539701424465. · Zbl 1069.68644
[5] Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Order preserving incomplete suffix trees and order-preserving indexes. In Oren Kurland, Moshe Lewenstein, and Ely Porat, editors, {\it String Processing and Information Retrieval - 20th} {\it International Symposium, SPIRE 2013, Jerusalem, Israel, October 7-9, 2013, Proceedings}, volume 8214 of {\it Lecture Notes in Computer Science}, pages 84-95. Springer, 2013. doi: 10.1007/978-3-319-02432-5_13. · Zbl 1345.68300
[6] Paolo Ferragina and Giovanni Manzini. Indexing compressed text. {\it J. ACM}, 52(4):552-581, 2005. doi:10.1145/1082036.1082039. · Zbl 1323.68261
[7] Johannes Fischer. Wee LCP. {\it Inf. Process. Lett.}, 110(8-9):317-320, 2010. doi:10.1016/j. ipl.2010.02.010. · Zbl 1211.68129
[8] Johannes Fischer and Volker Heun. A new succinct representation of rmq-information and improvements in the enhanced suffix array. In Bo Chen, Mike Paterson, and Guochuan Zhang, editors, {\it Combinatorics, Algorithms, Probabilistic and Experimental Methodologies,} {\it First International Symposium, ESCAPE 2007, Hangzhou, China, April 7-9, 2007, Re-} {\it vised Selected Papers}, volume 4614 of {\it Lecture Notes in Computer Science}, pages 459-470. Springer, 2007. doi:10.1007/978-3-540-74450-4_41. · Zbl 1176.68058
[9] Arnab Ganguly, Rahul Shah, and Sharma V Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In {\it Proceedings of the} {\it Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms}, pages 397-407. Society for Industrial and Applied Mathematics, 2017. · Zbl 1410.68098
[10] Raffaele Giancarlo.The suffix of a square matrix, with applications.In Vijaya Ramachandran, editor, {\it Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium} {\it on Discrete Algorithms, 25-27 January 1993, Austin, Texas.}, pages 402-411. ACM/SIAM, 1993. · Zbl 0801.68038
[11] Alexander Golynski, Roberto Grossi, Ankur Gupta, Rajeev Raman, and S. Srinivasa Rao. On the size of succinct indices. In Lars Arge, Michael Hoffmann, and Emo Welzl, editors, {\it Algorithms - ESA 2007, 15th Annual European Symposium, Eilat, Israel, October 8-10,} {\it 2007, Proceedings}, volume 4698 of {\it Lecture Notes in Computer Science}, pages 371-382. Springer, 2007. doi:10.1007/978-3-540-75520-3_34. · Zbl 1151.68385
[12] Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In {\it Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Dis-} {\it crete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA.}, pages 841-850. ACM/SIAM, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644250. · Zbl 1092.68584
[13] Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. {\it SIAM J. Comput.}, 35(2):378-407, 2005. doi:10.1137/S0097539702402354. · Zbl 1092.68115
[14] Dan Gusfield. {\it Algorithms on Strings, Trees, and Sequences - Computer Science and Com-} {\it putational Biology}. Cambridge University Press, 1997. · Zbl 0934.68103
[15] Dong Kyue Kim, Yoo Ah Kim, and Kunsoo Park.Generalizations of suffix arrays to multi-dimensional matrices. {\it Theor. Comput. Sci.}, 302(1-3):223-238, 2003. doi:10.1016/ S0304-3975(02)00828-9. · Zbl 1044.68029
[16] Gonzalo Navarro. Spaces, trees, and colors: The algorithmic landscape of document re trieval on sequences. {\it ACM Comput. Surv.}, 46(4):52:1-52:47, 2013. doi:10.1145/2535933. · Zbl 1305.68078
[17] Gonzalo Navarro.Wavelet trees for all.{\it J. Discrete Algorithms}, 25:2-20, 2014.doi: 10.1016/j.jda.2013.07.004. · Zbl 1284.68217
[18] Gonzalo Navarro and Veli Mäkinen. Compressed full-text indexes. {\it ACM Comput. Surv.}, 39(1), 2007. doi:10.1145/1216370.1216372. · Zbl 1321.68263
[19] Gonzalo Navarro and Kunihiko Sadakane. Fully functional static and dynamic succinct trees. {\it ACM Trans. Algorithms}, 10(3):16:1-16:39, 2014. doi:10.1145/2601073. · Zbl 1333.68084
[20] Luís M. S. Russo, Gonzalo Navarro, and Arlindo L. Oliveira. Fully compressed suffix trees. {\it ACM Transactions on Algorithms}, 7(4):53, 2011. doi:10.1145/2000807.2000821. · Zbl 1295.68103
[21] Tetsuo Shibuya.Generalization of a suffix tree for RNA structural pattern matching. In Magnús M. Halldórsson, editor, {\it Algorithm Theory - SWAT 2000, 7th Scandinavian} {\it Workshop on Algorithm Theory, Bergen, Norway, July 5-7, 2000, Proceedings}, volume 1851 of {\it Lecture Notes in Computer Science}, pages 393-406. Springer, 2000. doi:10.1007/ 3-540-44985-X_34. · Zbl 0966.68508
[22] Tetsuo Shibuya. Geometric suffix tree: Indexing protein 3-d structures. {\it J. ACM}, 57(3):15:1- 15:17, 2010. doi:10.1145/1706591.1706595. · Zbl 1327.68083
[23] Dekel Tsur. Top-k document retrieval in optimal space. {\it Inf. Process. Lett.}, 113(12):440-443, 2013. doi:10.1016/j.ipl.2013.03.012. · Zbl 1371.68071
[24] :12
[25] :13
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.