×

Alphabet-independent linear-time construction of compressed suffix arrays using \(o(n \log n)\)-bit working space. (English) Zbl 1125.68041

Summary: The suffix array is a fundamental index data structure in string algorithms and bioinformatics, and the compressed suffix array (CSA) and the FM-index are its compressed versions. Many algorithms for constructing these index data structures have been developed. Recently, Hon et al. [W. K. Hon, K. Sadakane and W. K. Sung, “Breaking a time-and-space barrier in constructing full-text indices”, in: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, 251–260 (2003)] proposed a construction algorithm using \(O(n\log\log|\Sigma|)\) time and \(O(n\log |\Sigma|)\)-bit working space, which is the fastest algorithm using \(O(n\log |\Sigma|)\)-bit working space.
In this paper we give an efficient algorithm to construct the index data structures. Our algorithm constructs the suffix array, the CSA, the FM-index, and Burrows-Wheeler transform using alphabet-independent \(O(n)\) time and \(O(n \log |\Sigma| \log^{\alpha}_{|\Sigma|} n)\)-bit working space, where \(\alpha =\log_3 2\). Our algorithm takes less time and more space than Hon et al.’s algorithm. Our algorithm uses least working space among alphabet-independent linear-time algorithms.

MSC:

68P05 Data structures
Full Text: DOI

References:

[1] M. Burrows, D.J. Wheeler, A block-sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, Paolo Alto, California, 1994; M. Burrows, D.J. Wheeler, A block-sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, Paolo Alto, California, 1994
[2] D.R. Clark, Compact pat trees, Ph.D. Thesis, University of Waterloo, Waterloo, 1996; D.R. Clark, Compact pat trees, Ph.D. Thesis, University of Waterloo, Waterloo, 1996
[3] M. Farach, Optimal suffix tree construction with large alphabets, in: Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science, 1997, pp. 137-143; M. Farach, Optimal suffix tree construction with large alphabets, in: Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science, 1997, pp. 137-143
[4] Farach-Colton, M.; Ferragina, P.; Muthukrishnan, S., On the sorting-complexity of suffix tree construction, Journal of the ACM, 47, 6, 987-1011 (2000) · Zbl 1094.68694
[5] Ferragina, P.; Manzini, G., Indexing compressed text, Journal of the ACM, 52, 4, 552-581 (2005) · Zbl 1323.68261
[6] Gonnet, G. H.; Baeza-Yates, R.; Snider, T., New indices for text: Pat trees and pat arrays, (Frakes, W. B.; Baeza-Yates, R., Information Retrieval: Data Structures & Algorithms (1992), Prentice Hall), 66-82
[7] Grossi, R.; Vitter, J. S., Compressed suffix arrays and suffix trees with applications to text indexing and string matching, SIAM Journal on Computing, 35, 2, 378-407 (2005) · Zbl 1092.68115
[8] D. Gusfield, An “Increment-by-one” approach to suffix arrays and trees, 1990, Manuscript; D. Gusfield, An “Increment-by-one” approach to suffix arrays and trees, 1990, Manuscript
[9] W.K. Hon, T.W. Lam, K. Sadakane, W.K. Sung, Constructing compressed suffix arrays with large alphabets, in: Proceedings of the 14th International Symposium on Algorithms and Computation, 2003, pp. 240-249; W.K. Hon, T.W. Lam, K. Sadakane, W.K. Sung, Constructing compressed suffix arrays with large alphabets, in: Proceedings of the 14th International Symposium on Algorithms and Computation, 2003, pp. 240-249 · Zbl 1205.68128
[10] Hon, W. K.; Lam, T. W.; Sadakane, K.; Sung, W. K.; Yiu, S. M., A space and time efficient algorithm for constructing compressed suffix arrays, Algorithmica (2007) · Zbl 1123.68137
[11] W.K. Hon, K. Sadakane, W.K. Sung, Breaking a time-and-space barrier in constructing full-text indices, in: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, 2003, pp. 251-260; W.K. Hon, K. Sadakane, W.K. Sung, Breaking a time-and-space barrier in constructing full-text indices, in: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, 2003, pp. 251-260
[12] Kärkkäinen, J.; Sanders, P.; Burkhardt, S., Linear work suffix array construction, Journal of the ACM, 53, 6, 918-936 (2006) · Zbl 1326.68111
[13] Kim, D. K.; Sim, J. S.; Park, H.; Park, K., Constructing suffix arrays in linear time, Journal of Discrete Algorithms, 3, 2-4, 126-142 (2005) · Zbl 1101.68505
[14] Ko, P.; Aluru, S., Space efficient linear time construction of suffix arrays, Journal of Discrete Algorithms, 3, 2-4, 143-156 (2005) · Zbl 1101.68506
[15] T.W. Lam, K. Sadakane, W.K. Sung, S.M. Yiu, A space and time efficient algorithm for constructing compressed suffix arrays, in: Proceedings of the 9th International Computing and Combinatorics Conference, 2002, pp. 401-410; T.W. Lam, K. Sadakane, W.K. Sung, S.M. Yiu, A space and time efficient algorithm for constructing compressed suffix arrays, in: Proceedings of the 9th International Computing and Combinatorics Conference, 2002, pp. 401-410 · Zbl 1077.68947
[16] Manber, U.; Myers, G., Suffix arrays: A new method for on-line string searches, SIAM Journal on Computing, 22, 5, 935-948 (1993) · Zbl 0784.68027
[17] J.I. Munro, Tables, in: Proceedings of Conference on Foundations of Software Technology and Theoretical Computer Science, 1996, pp. 37-42; J.I. Munro, Tables, in: Proceedings of Conference on Foundations of Software Technology and Theoretical Computer Science, 1996, pp. 37-42
[18] Sadakane, K., New text indexing functionalities of the compressed suffix arrays, Journal of Algorithms, 48, 2, 294-313 (2003) · Zbl 1100.68563
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.