×

Internal shortest absent word queries in constant time and linear space. (English) Zbl 1535.68491

Summary: Given a string \(T\) of length \(n\) over an alphabet \(\Sigma \subset \{1, 2,\dots, n^{\mathcal{O} (1)} \}\) of size \(\sigma\), we are to preprocess \(T\) so that given a range \([i, j]\), we can return a representation of a shortest string over \(\Sigma\) that is absent in the fragment \(T [i] \cdots T [j]\) of \(T\). We present an \(\mathcal{O}(n)\)-space data structure that answers such queries in constant time and can be constructed in \(\mathcal{O}(n \log_\sigma n)\) time.

MSC:

68W32 Algorithms on strings
68P05 Data structures

References:

[1] Abedin, P.; Ganguly, A.; Hon, W.; Matsuda, K.; Nekrich, Y.; Sadakane, K.; Shah, R.; Thankachan, S. V., A linear-space data structure for range-lcp queries in poly-logarithmic time, Theor. Comput. Sci., 822, 15-22 (2020) · Zbl 1455.68045
[2] Abedin, P.; Ganguly, A.; Pissis, S. P.; Thankachan, S. V., Efficient data structures for range shortest unique substring queries, Algorithms, 13, 276 (2020)
[3] Amir, A.; Apostolico, A.; Landau, G. M.; Levy, A.; Lewenstein, M.; Porat, E., Range LCP, J. Comput. Syst. Sci., 80, 1245-1253 (2014) · Zbl 1410.68414
[4] Amir, A.; Charalampopoulos, P.; Pissis, S. P.; Radoszewski, J., Dynamic and internal longest common substring, Algorithmica, 82, 3707-3743 (2020) · Zbl 1494.68314
[5] Ayad, L. A.K.; Badkobeh, G.; Fici, G.; Héliou, A.; Pissis, S. P., Constructing antidictionaries of long texts in output-sensitive space, Theory Comput. Syst., 65, 777-797 (2021) · Zbl 1517.68428
[6] Babenko, M. A.; Gawrychowski, P.; Kociumaka, T.; Kolesnichenko, I. I.; Starikovskaya, T., Computing minimal and maximal suffixes of a substring, Theor. Comput. Sci., 638, 112-121 (2016) · Zbl 1344.68308
[7] Babenko, M. A.; Gawrychowski, P.; Kociumaka, T.; Starikovskaya, T., Wavelet trees meet suffix trees, (Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms. Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015 (2015), SIAM), 572-591 · Zbl 1372.68070
[8] Badkobeh, G.; Charalampopoulos, P.; Pissis, S. P., Internal shortest absent word queries, (32nd Annual Symposium on Combinatorial Pattern Matching. 32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021 (2021)), 24:1-24:18
[9] Barton, C.; Héliou, A.; Mouchard, L.; Pissis, S. P., Linear-time computation of minimal absent words using suffix array, BMC Bioinform., 15, 388 (2014)
[10] Barton, C.; Héliou, A.; Mouchard, L.; Pissis, S. P., Parallelising the computation of minimal absent words, (Parallel Processing and Applied Mathematics - 11th International Conference, PPAM 2015. Revised Selected Papers, Part II (2015), Springer), 243-253
[11] Bender, M. A.; Farach-Colton, M., The LCA problem revisited, (LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Proceedings (2000), Springer), 88-94 · Zbl 0959.68133
[12] Berkman, O.; Vishkin, U., Recursive star-tree parallel data structure, SIAM J. Comput., 22, 221-242 (1993) · Zbl 0770.68044
[13] Bille, P.; Gørtz, I. L.; Knudsen, M. B.T.; Lewenstein, M.; Vildhøj, H. W., Longest common extensions in sublinear space, (Combinatorial Pattern Matching - 26th Annual Symposium. Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015 (2015)), 65-76 · Zbl 1380.68139
[14] Bille, P.; Gørtz, I. L.; Sach, B.; Vildhøj, H. W., Time-space trade-offs for longest common extensions, J. Discret. Algorithms, 25, 42-50 (2014) · Zbl 1284.68208
[15] Birenzwige, O.; Golan, S.; Porat, E., Locally consistent parsing for text indexing in small space, (Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms. Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020 (2020)), 607-626 · Zbl 07304060
[16] Chairungsee, S.; Crochemore, M., Using minimal absent words to build phylogeny, Theor. Comput. Sci., 450, 109-116 (2012) · Zbl 1243.68332
[17] Chan, T. M.; Larsen, K. G.; Pătraşcu, M., Orthogonal range searching on the ram, revisited, (Proceedings of the 27th ACM Symposium on Computational Geometry. Proceedings of the 27th ACM Symposium on Computational Geometry, SCG 2011 (2011), ACM), 1-10 · Zbl 1283.68139
[18] Charalampopoulos, P.; Crochemore, M.; Fici, G.; Mercaş, R.; Pissis, S. P., Alignment-free sequence comparison using absent words, Inf. Comput., 262, 57-68 (2018) · Zbl 1400.68264
[19] Charalampopoulos, P.; Crochemore, M.; Pissis, S. P., On extended special factors of a word, (String Processing and Information Retrieval - 25th International Symposium. String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018 (2018), Springer), 131-138 · Zbl 1518.68280
[20] Charalampopoulos, P.; Gawrychowski, P.; Mozes, S.; Weimann, O., An almost optimal edit distance oracle, (48th International Colloquium on Automata, Languages, and Programming. 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021 (2021)), 48:1-48:20
[21] Charalampopoulos, P.; Kociumaka, T.; Mohamed, M.; Radoszewski, J.; Rytter, W.; Straszyński, J.; Waleń, T.; Zuba, W., Counting distinct patterns in internal dictionary matching, (31st Annual Symposium on Combinatorial Pattern Matching. 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 (2020)), 8:1-8:15 · Zbl 1534.68275
[22] Charalampopoulos, P.; Kociumaka, T.; Mohamed, M.; Radoszewski, J.; Rytter, W.; Waleń, T., Internal dictionary matching, Algorithmica, 83, 2142-2169 (2021) · Zbl 1515.68099
[23] Charalampopoulos, P.; Kociumaka, T.; Wellnitz, P., Faster approximate pattern matching: a unified approach, (61st IEEE Annual Symposium on Foundations of Computer Science. 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020 (2020), IEEE), 978-989
[24] Crochemore, M.; Héliou, A.; Kucherov, G.; Mouchard, L.; Pissis, S. P.; Ramusat, Y., Absent words in a sliding window with applications, Inf. Comput., 270 (2020) · Zbl 1436.68406
[25] Crochemore, M.; Mignosi, F.; Restivo, A., Automata and forbidden words, Inf. Process. Lett., 67, 111-117 (1998) · Zbl 1339.68145
[26] Crochemore, M.; Mignosi, F.; Restivo, A.; Salemi, S., Data compression using antidictionaries, Proc. IEEE, 88, 1756-1768 (2000)
[27] Dinklage, P.; Fischer, J.; Herlez, A.; Kociumaka, T.; Kurpicz, F., Practical performance of space efficient data structures for longest common extensions, (28th Annual European Symposium on Algorithms. 28th Annual European Symposium on Algorithms, ESA 2020 (2020)), 39:1-39:20 · Zbl 07651178
[28] Farach, M., Optimal suffix tree construction with large alphabets, (38th Annual Symposium on Foundations of Computer Science. 38th Annual Symposium on Foundations of Computer Science, FOCS 1997 (1997), IEEE Computer Society), 137-143
[29] Ferenczi, S., Complexity of sequences and dynamical systems, Discrete Math., 206, 145-154 (1999) · Zbl 0936.37008
[30] Fici, G.; Gawrychowski, P., Minimal absent words in rooted and unrooted trees, (String Processing and Information Retrieval - 26th International Symposium. String Processing and Information Retrieval - 26th International Symposium, SPIRE 2019 (2019), Springer), 152-161 · Zbl 1539.68378
[31] Fici, G.; Mignosi, F.; Restivo, A.; Sciortino, M., Word assembly through minimal forbidden words, Theor. Comput. Sci., 359, 214-230 (2006) · Zbl 1097.68108
[32] Fici, G.; Restivo, A.; Rizzo, L., Minimal forbidden factors of circular words, Theor. Comput. Sci., 792, 144-153 (2019) · Zbl 1439.68018
[33] Fine, N. J.; Wilf, H. S., Uniqueness theorems for periodic functions, Proc. Am. Math. Soc., 16, 109-114 (1965) · Zbl 0131.30203
[34] Fischer, J.; Heun, V., Space-efficient preprocessing schemes for range minimum queries on static arrays, SIAM J. Comput., 40, 465-492 (2011) · Zbl 1222.05024
[35] Fredman, M. L.; Willard, D. E., Surpassing the information theoretic bound with fusion trees, J. Comput. Syst. Sci., 47, 424-436 (1993) · Zbl 0795.68049
[36] Fujishige, Y.; Tsujimaru, Y.; Inenaga, S.; Bannai, H.; Takeda, M., Computing DAWGs and minimal absent words in linear time for integer alphabets, (41st International Symposium on Mathematical Foundations of Computer Science. 41st International Symposium on Mathematical Foundations of Computer Science, MFCS 2016 (2016), Schloss Dagstuhl - Leibniz-Zentrum für Informatik), 38:1-38:14 · Zbl 1398.68703
[37] Ganguly, A.; Patil, M.; Shah, R.; Thankachan, S. V., A linear space data structure for range LCP queries, Fundam. Inform., 163, 245-251 (2018) · Zbl 1405.68463
[38] Garcia, S. P.; Pinho, A. J.; Rodrigues, J. M.O. S.; Bastos, C. A.C.; Ferreira, P. J.S. G., Minimal absent words in prokaryotic and eukaryotic genomes, PLoS ONE, 6 (2011)
[39] Grossi, R.; Orlandi, A.; Raman, R.; Rao, S. S., More haste, less waste: lowering the redundancy in fully indexable dictionaries, (26th International Symposium on Theoretical Aspects of Computer Science. 26th International Symposium on Theoretical Aspects of Computer Science, STACS 2009 (2009)), 517-528 · Zbl 1236.68064
[40] Harel, D.; Tarjan, R. E., Fast algorithms for finding nearest common ancestors, SIAM J. Comput., 13, 338-355 (1984) · Zbl 0535.68022
[41] Jacobson, G., Space-efficient static trees and graphs, (30th Annual Symposium on Foundations of Computer Science. 30th Annual Symposium on Foundations of Computer Science, FOCS 1989 (1989), IEEE Computer Society), 549-554
[42] Keller, O.; Kopelowitz, T.; Feibish, S. L.; Lewenstein, M., Generalized substring compression, Theor. Comput. Sci., 525, 42-54 (2014) · Zbl 1295.68112
[43] Kempa, D.; Kociumaka, T., String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure, (Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019 (2019), ACM), 756-767 · Zbl 1433.68132
[44] Kociumaka, T., Minimal suffix and rotation of a substring in optimal time, (27th Annual Symposium on Combinatorial Pattern Matching. 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016 (2016)), 28:1-28:12 · Zbl 1380.68151
[45] Kociumaka, T., Efficient Data Structures for Internal Queries in Texts (2018), University of Warsaw, Ph.D. thesis
[46] Kociumaka, T.; Radoszewski, J.; Rytter, W.; Waleń, T., Efficient data structures for the factor periodicity problem, (String Processing and Information Retrieval - 19th International Symposium. String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012 (2012)), 284-294
[47] Kociumaka, T.; Radoszewski, J.; Rytter, W.; Waleń, T., Internal pattern matching queries in a text and applications, (Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms. Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015 (2015), SIAM), 532-551 · Zbl 1371.68340
[48] Kosolobov, D., Tight lower bounds for the longest common extension problem, Inf. Process. Lett., 125, 26-29 (2017) · Zbl 1409.68355
[49] Landau, G. M.; Vishkin, U., Fast string matching with k differences, J. Comput. Syst. Sci., 37, 63-78 (1988) · Zbl 0655.68075
[50] Matsuda, K.; Sadakane, K.; Starikovskaya, T.; Tateshita, M., Compressed orthogonal search on suffix arrays with applications to range LCP, (31st Annual Symposium on Combinatorial Pattern Matching. 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 (2020)), 23:1-23:13 · Zbl 07651114
[51] Mieno, T.; Kuhara, Y.; Akagi, T.; Fujishige, Y.; Nakashima, Y.; Inenaga, S.; Bannai, H.; Takeda, M., Minimal unique substrings and minimal absent words in a sliding window, (46th SOFSEM (2020), Springer), 148-160 · Zbl 1440.68343
[52] Mignosi, F.; Restivo, A.; Sciortino, M., Words and forbidden factors, Theor. Comput. Sci., 273, 99-117 (2002) · Zbl 0997.68093
[53] Navarro, G., Compact Data Structures - a Practical Approach (2016), Cambridge University Press
[54] Ota, T.; Morita, H., On the adaptive antidictionary code using minimal forbidden words with constant lengths, (Proceedings of the International Symposium on Information Theory and Its Applications. Proceedings of the International Symposium on Information Theory and Its Applications, ISITA 2010 (2010), IEEE), 72-77
[55] Pǎtraşcu, M.; Thorup, M., Dynamic integer sets with optimal rank, select, and predecessor search, (55th IEEE Annual Symposium on Foundations of Computer Science. 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014 (2014), IEEE Computer Society), 166-175
[56] Pratas, D.; Silva, J. M., Persistent minimal sequences of SARS-CoV-2, Bioinformatics (2020)
[57] Prezza, N., In-place sparse suffix sorting, (Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018 (2018)), 1496-1508 · Zbl 1403.68376
[58] Raskhodnikova, S.; Ron, D.; Rubinfeld, R.; Smith, A. D., Sublinear algorithms for approximating string compressibility, Algorithmica, 65, 685-709 (2013) · Zbl 1270.68109
[59] Rubinchik, M.; Shur, A. M., Counting palindromes in substrings, (String Processing and Information Retrieval - 24th International Symposium. String Processing and Information Retrieval - 24th International Symposium, SPIRE 2017 (2017), Springer), 290-303 · Zbl 1454.68209
[60] Sakai, Y., A substring-substring LCS data structure, Theor. Comput. Sci., 753, 16-34 (2019) · Zbl 1407.68579
[61] Sakai, Y., A data structure for substring-substring lcs length queries, Theor. Comput. Sci., 911, 41-54 (2022) · Zbl 1535.68064
[62] Silva, R. M.; Pratas, D.; Castro, L.; Pinho, A. J.; Ferreira, P. J.S. G., Three minimal sequences found in Ebola virus genomes and absent from human DNA, Bioinformatics, 31, 2421-2425 (2015)
[63] Tanimura, Y.; I, T.; Bannai, H.; Inenaga, S.; Puglisi, S. J.; Takeda, M., Deterministic sub-linear space LCE data structures with efficient construction, (27th Annual Symposium on Combinatorial Pattern Matching. 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016 (2016)), 1:1-1:10 · Zbl 1380.68161
[64] Tanimura, Y.; Nishimoto, T.; Bannai, H.; Inenaga, S.; Takeda, M., Small-space LCE data structure with constant-time queries, (42nd International Symposium on Mathematical Foundations of Computer Science. 42nd International Symposium on Mathematical Foundations of Computer Science, MFCS 2017 (2017)), 10:1-10:15 · Zbl 1441.68026
[65] Tiskin, A., Semi-local string comparison: algorithmic techniques and applications, Math. Comput. Sci., 1, 571-603 (2008) · Zbl 1158.68054
[66] Yao, A. C., Space-time tradeoff for answering range queries (extended abstract), (Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing. Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, STOC 1982 (1982), ACM), 128-136
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.