×

A simple storage scheme for strings achieving entropy bounds. (English) Zbl 1110.68029

Summary: We propose a storage scheme for a string \(S[1,n]\), drawn from an alphabet \(\Sigma\), that requires space close to the \(k\)-th order empirical entropy of \(S\), and allows one to retrieve any substring of \(S\) of length \(\ell\) in optimal \(O(1+ \frac{\ell}{\log_{|\Sigma|}n})\) time. This matches the best known bounds via the use of binary encodings and tables only. We also apply our storage scheme to the Burrows-Wheeler Transform and achieve a space bound which depends on both the \(k\)-th order entropy of \(S\) and the \(k\)-th order entropy of its BW-transformed string \(\text{bwt}(S)\).

MSC:

68P05 Data structures
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)

References:

[1] J. Barbay, J.I. Munro, M. He, S.S. Rao, Succinct indexes for strings, binary relations and multi-labeled trees, in: Procs ACM-SIAM SODA, 2007; J. Barbay, J.I. Munro, M. He, S.S. Rao, Succinct indexes for strings, binary relations and multi-labeled trees, in: Procs ACM-SIAM SODA, 2007 · Zbl 1302.68097
[2] M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994; M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994
[3] Ferragina, P.; Giancarlo, R.; Manzini, G.; Sciortino, M., Boosting textual compression in optimal linear time, Journal of the ACM, 52, 4, 688-713 (2005) · Zbl 1323.68260
[4] Ferragina, P.; Manzini, G., Indexing compressed text, Journal of the ACM, 52, 4, 552-581 (2005) · Zbl 1323.68261
[5] Golynski, A., Optimal lower bounds for rank and select indexes, (Procs ICALP. Procs ICALP, LNCS, vol. 4051 (2006)), 370-381 · Zbl 1223.68030
[6] González, R.; Navarro, G., Statistical encoding of succinct data structures, (Procs CPM. Procs CPM, LNCS, vol. 4009 (2006)), 295-306
[7] R. Grossi, A. Gupta, J. Vitter, High-order entropy-compressed text indexes, in: Procs ACM-SIAM SODA, 2003, pp. 841-850; R. Grossi, A. Gupta, J. Vitter, High-order entropy-compressed text indexes, in: Procs ACM-SIAM SODA, 2003, pp. 841-850 · Zbl 1092.68584
[8] J. Jansson, K. Sadakane, K.K. Sung, Ultra-succinct representation of ordered trees, in: Procs ACM-SIAM SODA, 2007; J. Jansson, K. Sadakane, K.K. Sung, Ultra-succinct representation of ordered trees, in: Procs ACM-SIAM SODA, 2007 · Zbl 1302.68100
[9] Manzini, G., An analysis of the Burrows-Wheeler transform, Journal of the ACM, 48, 3, 407-430 (2001) · Zbl 1323.68262
[10] Munro, I., Tables, (Procs FST-TCS. Procs FST-TCS, LNCS, vol. 1180 (1996)), 37-42 · Zbl 1541.68118
[11] G. Navarro, V. Mäkinen, Compressed full-text indexes, ACM Computing Surveys, 2007 (in press); G. Navarro, V. Mäkinen, Compressed full-text indexes, ACM Computing Surveys, 2007 (in press)
[12] R. Raman, V. Raman, S. Srinivasa Rao, Succinct indexable dictionaries with applications to encoding \(k\)-ary trees and multisets, in: Procs ACM-SIAM SODA, 2002, pp. 233-242; R. Raman, V. Raman, S. Srinivasa Rao, Succinct indexable dictionaries with applications to encoding \(k\)-ary trees and multisets, in: Procs ACM-SIAM SODA, 2002, pp. 233-242 · Zbl 1093.68582
[13] Sadakane, K., New text indexing functionalities of the compressed suffix arrays, Journal of Algorithms, 48, 2, 294-313 (2003) · Zbl 1100.68563
[14] K. Sadakane, R. Grossi, Squeezing succinct data structures into entropy bounds, in: Procs ACM-SIAM SODA, 2006, pp. 1230-1239; K. Sadakane, R. Grossi, Squeezing succinct data structures into entropy bounds, in: Procs ACM-SIAM SODA, 2006, pp. 1230-1239 · Zbl 1192.68188
[15] Witten, I. H.; Moffat, A.; Bell, T. C., Managing Gigabytes: Compressing and Indexing Documents and Images (1999), Morgan Kaufmann Publishers: Morgan Kaufmann Publishers Los Altos, CA 94022, USA · Zbl 0821.68051
[16] Ziv, J.; Lempel, A., Compression of individual sequences via variable length coding, IEEE Transactions on Information Theory, 24, 530-536 (1978) · Zbl 0392.94004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.