×

The alternating BWT: an algorithmic perspective. (English) Zbl 1435.68087

Summary: The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in [I. M. Gessel et al., Eur. J. Comb. 33, No. 7, 1537–1546 (2012; Zbl 1244.05016)] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in [the authors, Lect. Notes Comput. Sci. 11088, 1–17 (2018; Zbl 1436.68102)], where we have shown that BWT and ABWT are part of a larger class of reversible transformations, here we provide a combinatorial and algorithmic study of the novel transform ABWT. We establish a deep analogy between BWT and ABWT by proving they are the only ones in the above mentioned class to be rank-invertible, a novel notion guaranteeing efficient invertibility. In addition, we show that the backward-search procedure can be efficiently generalized to the ABWT; this result implies that also the ABWT can be used as a basis for efficient compressed full text indices. Finally, we prove that the ABWT can be efficiently computed by using a combination of the Difference Cover suffix sorting algorithm [J. Kärkkäinen et al., J. ACM 53, No. 6, 918–936 (2006; Zbl 1326.68111)] with a linear time algorithm for finding the minimal cyclic rotation of a word with respect to the alternating lexicographical order.

MSC:

68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
68R15 Combinatorics on words
68W32 Algorithms on strings

Software:

BWA

References:

[1] Belazzougui, D.; Navarro, G., Optimal lower and upper bounds for representing sequences, ACM Trans. Algorithms, 11, 4, Article 31 pp. (2015) · Zbl 1398.68103
[2] Bonomo, S.; Mantaci, S.; Restivo, A.; Rosone, G.; Sciortino, M., Sorting conjugates and suffixes of words in a multiset, Int. J. Found. Comput. Sci., 25, 08, 1161-1175 (2014) · Zbl 1310.68172
[3] Booth, K. S., Lexicographically least circular substrings, Inf. Process. Lett., 10, 4/5, 240-242 (1980) · Zbl 0444.68064
[4] Burrows, M.; Wheeler, D. J., A Block Sorting Data Compression Algorithm (1994), DIGITAL System Research Center, Technical Report
[5] Chapin, B.; Tate, S., Higher compression from the Burrows-Wheeler transform by modified sorting, (DCC (1998), IEEE Computer Society), 532, Full version available from
[6] Colbourn, C. J.; Ling, A. C.H., Quorums from difference covers, Inf. Process. Lett., 75, 1-2, 9-12 (2000) · Zbl 1339.68089
[7] Cox, A.; Bauer, M.; Jakobi, T.; Rosone, G., Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform, Bioinformatics, 28, 11, 1415-1419 (2012)
[8] Cox, A. J.; Garofalo, F.; Rosone, G.; Sciortino, M., Lightweight LCP construction for very large collections of strings, J. Discret. Algorithms, 37, 17-33 (2016) · Zbl 1362.68303
[9] Crochemore, M.; Désarménien, J.; Perrin, D., A note on the Burrows-Wheeler transformation, Theor. Comput. Sci., 332, 567-572 (2005) · Zbl 1070.68126
[10] Daykin, J.; Groult, R.; Guesnet, Y.; Lecroq, T.; Lefebvre, A.; Léonard, M.; Prieur-Gaston, É., A survey of string orderings and their application to the Burrows-Wheeler transform, Theor. Comput. Sci. (2017)
[11] Dolce, F.; Restivo, A.; Reutenauer, C., On generalized Lyndon words, Theor. Comput. Sci., 777, 232-242 (2019) · Zbl 1426.68229
[12] Duval, J.-P., Factorizing words over an ordered alphabet, J. Algorithms, 4, 4, 363-381 (1983) · Zbl 0532.68061
[13] Egidi, L.; Louza, F. A.; Manzini, G.; Telles, G. P., External memory BWT and LCP computation for sequence collections with applications, Algorithms Mol. Biol., 14, 1, Article 6 pp. (2019) · Zbl 1494.92085
[14] Fenwick, P., The Burrows-Wheeler transform for block sorting text compression: principles and improvements, Comput. J., 39, 9, 731-740 (1996)
[15] Ferenczi, S.; Zamboni, L. Q., Clustering words and interval exchanges, J. Integer Seq., 16, 2, Article 13.2.1 pp. (2013) · Zbl 1291.68305
[16] Ferragina, P.; Giancarlo, R.; Manzini, G.; Sciortino, M., Boosting textual compression in optimal linear time, J. ACM, 52, 4, 688-713 (2005) · Zbl 1323.68260
[17] Ferragina, P.; Manzini, G., Opportunistic data structures with applications, (FOCS 2000 (2000), IEEE Computer Society), 390-398
[18] Ferragina, P.; Manzini, G., Indexing compressed text, J. ACM, 52, 552-581 (2005) · Zbl 1323.68261
[19] Ferragina, P.; Nitto, I.; Venturini, R., On optimally partitioning a text to improve its compression, Algorithmica, 61, 1, 51-74 (2011) · Zbl 1221.68302
[20] Gagie, T.; Manzini, G.; Sirén, J., Wheeler graphs: a framework for BWT-based data structures, Theor. Comput. Sci., 698, 67-78 (2017) · Zbl 1380.68145
[21] Gessel, I. M.; Restivo, A.; Reutenauer, C., A bijection between words and multisets of necklaces, Eur. J. Comb., 33, 7, 1537-1546 (2012) · Zbl 1244.05016
[22] Gessel, I. M.; Reutenauer, C., Counting permutations with given cycle structure and descent set, J. Comb. Theory, Ser. A, 64, 2, 189-215 (1993) · Zbl 0793.05004
[23] Giancarlo, R.; Manzini, G.; Restivo, A.; Rosone, G.; Sciortino, M., Block sorting-based transformations on words: beyond the magic BWT, (DLT. DLT, LNCS, vol. 11088 (2018), Springer International Publishing), 1-17 · Zbl 1436.68102
[24] Giancarlo, R.; Manzini, G.; Rosone, G.; Sciortino, M., A new class of searchable and provably highly compressible string transformations, (CPM. CPM, LIPIcs, vol. 128 (2019), Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik), Article 12 pp. · Zbl 1529.68095
[25] Giancarlo, R.; Restivo, A.; Sciortino, M., From first principles to the Burrows and Wheeler transform and beyond, via combinatorial optimization, Theor. Comput. Sci., 387, 236-248 (2007) · Zbl 1144.68019
[26] Gusfield, D., Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology (1997), Cambridge University Press · Zbl 0934.68103
[27] Kärkkäinen, J.; Sanders, P., Simple linear work suffix array construction, (Automata, Languages and Programming. Automata, Languages and Programming, LNCS, vol. 2719 (2003), Springer Berlin Heidelberg), 943-955 · Zbl 1039.68042
[28] Kärkkäinen, J.; Sanders, P.; Burkhardt, S., Linear work suffix array construction, J. ACM, 53, 918-936 (2006) · Zbl 1326.68111
[29] Kimura, K.; Koike, A., Ultrafast SNP analysis using the Burrows-Wheeler transform of short-read data, Bioinformatics, 31, 10, 1577-1583 (2015)
[30] Li, H.; Durbin, R., Fast and accurate long-read alignment with Burrows-Wheeler transform, Bioinformatics, 26, 5, 589-595 (2010)
[31] Lothaire, M., Applied Combinatorics on Words, Encyclopedia of Mathematics and Its Applications (2005), Cambridge University Press: Cambridge University Press New York, NY, USA · Zbl 1133.68067
[32] Mäkinen, V.; Belazzougui, D.; Cunial, F.; Tomescu, A. I., Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing (2015), Cambridge University Press
[33] Mantaci, S.; Restivo, A.; Rosone, G.; Sciortino, M., An extension of the Burrows-Wheeler transform, Theor. Comput. Sci., 387, 3, 298-312 (2007) · Zbl 1144.68024
[34] Mantaci, S.; Restivo, A.; Rosone, G.; Sciortino, M., A new combinatorial approach to sequence comparison, Theory Comput. Syst., 42, 3, 411-429 (2008) · Zbl 1136.68047
[35] Mantaci, S.; Restivo, A.; Rosone, G.; Sciortino, M., Burrows-Wheeler transform and run-length enconding, (Combinatorics on Words, Proceedings of the 11th International Conference. Combinatorics on Words, Proceedings of the 11th International Conference, WORDS 2017. Combinatorics on Words, Proceedings of the 11th International Conference. Combinatorics on Words, Proceedings of the 11th International Conference, WORDS 2017, LNCS, vol. 10432 (2017), Springer), 228-239 · Zbl 1405.68466
[36] Mantaci, S.; Restivo, A.; Rosone, G.; Sciortino, M.; Versari, L., Measuring the clustering effect of BWT via RLE, Theor. Comput. Sci., 698, 79-87 (2017) · Zbl 1380.68174
[37] Mantaci, S.; Restivo, A.; Sciortino, M., Burrows-Wheeler transform and Sturmian words, Inf. Process. Lett., 86, 241-246 (2003) · Zbl 1162.68511
[38] Mantaci, S.; Restivo, A.; Sciortino, M., Distance measures for biological sequences: some recent approaches, Int. J. Approx. Reason., 47, 1, 109-124 (2008) · Zbl 1183.92035
[39] Manzini, G., An analysis of the Burrows-Wheeler transform, J. ACM, 48, 3, 407-430 (2001) · Zbl 1323.68262
[40] Manzini, G.; Ferragina, P., Engineering a lightweight suffix array construction algorithm, Algorithmica, 40, 33-50 (2004) · Zbl 1082.68867
[41] Navarro, G., Compact Data Structures - A Practical Approach (2016), Cambridge University Press
[42] Pak, I.; Redlich, A., Long cycles in abc-permutations, Funct. Anal. Other Math., 2, 87-92 (2008) · Zbl 1175.05141
[43] Prezza, N.; Pisanti, N.; Sciortino, M.; Rosone, G., SNPs detection by eBWT positional clustering, Algorithms Mol. Biol., 14, 1, 3 (2019)
[44] Restivo, A.; Rosone, G., Burrows-Wheeler transform and palindromic richness, Theor. Comput. Sci., 410, 30-32, 3018-3026 (2009) · Zbl 1173.68055
[45] Restivo, A.; Rosone, G., Balancing and clustering of words in the Burrows-Wheeler transform, Theor. Comput. Sci., 412, 27, 3019-3032 (2011) · Zbl 1220.68081
[46] Reutenauer, C., Mots de Lyndon généralisés, Sémin. Lothar. Comb., 54, Article B54h pp. (2006) · Zbl 1183.68445
[47] Rosone, G.; Sciortino, M., The Burrows-Wheeler transform between data compression and combinatorics on words, (The Nature of Computation. Logic, Algorithms, Applications, Proceedings of the 9th Conference on Computability in Europe. The Nature of Computation. Logic, Algorithms, Applications, Proceedings of the 9th Conference on Computability in Europe, CiE 2013. The Nature of Computation. Logic, Algorithms, Applications, Proceedings of the 9th Conference on Computability in Europe. The Nature of Computation. Logic, Algorithms, Applications, Proceedings of the 9th Conference on Computability in Europe, CiE 2013, LNCS, vol. 7921 (2013), Springer), 353-364 · Zbl 1370.68088
[48] Schindler, M., A fast block-sorting algorithm for lossless data compression, (DCC (1997), IEEE Computer Society), 469
[49] Shiloach, Y., Fast canonization of circular strings, J. Algorithms, 2, 2, 107-121 (1981) · Zbl 0459.68035
[50] Simpson, J.; Puglisi, S. J., Words with simple Burrows-Wheeler transforms, Electron. J. Comb., 15, Article R83 pp. (2008) · Zbl 1183.68446
[51] Yang, L.; Zhang, X.; Wang, T., The Burrows-Wheeler similarity distribution between biological sequences based on Burrows-Wheeler transform, J. Theor. Biol., 262, 4, 742-749 (2010) · Zbl 1403.92192
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.