×

A linearly computable measure of string complexity. (English) Zbl 1255.68117

In this very interesting paper, the authors introduce the notion of the \(I\)-complexity of a given string, which counts the number of distinct substrings in it, and investigate its properties. They prove that this measure of string complexity is computable in linear time and space from the well-known longest common prefix array (LCP array), the companion data structure of the suffix array. They also introduce the repetitions array, which is a permutation of the LCP array, and develop its properties. It is shown that the number of strings with \(I\)-complexity up to a given value is bounded and a large family of strings with almost maximal complexity is presented. The authors prove that most strings have high \(I\)-complexity. Their measure of complexity is compared with the Lempel-Ziv complexity, their main differences are discussed, and it is shown that in spite of their very distinct behavior on infinitely many strings, their values are close. Finally, the authors discuss a number of open problems and ongoing research on \(I\)-complexity.

MSC:

68R15 Combinatorics on words
68W32 Algorithms on strings
Full Text: DOI

References:

[1] Adjeroh, Donald; Bell, Tim; Mukherjee, Amar, The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching (2008), Springer
[2] Asarin, E.; Degorre, A., Volume and entropy of regular timed languages: Analytic approach, (FORMATS’09. FORMATS’09, LNCS, vol. 5813 (2009), Springer-Verlag), 13-27 · Zbl 1262.68083
[3] Asarin, E.; Degorre, A., Volume and entropy of regular timed languages: Discretization approach, (CONCUR’09. CONCUR’09, LNCS, vol. 5710 (2009), Springer-Verlag), 69-83 · Zbl 1254.68128
[4] Becher, Verónica; Heiber, Pablo A., On extending de Bruijn sequences, Information Processing Letters, 111, 930-932 (2011) · Zbl 1260.68311
[5] Bruijn, N. G., A combinatorial problem, Koninklijke Nederlandse Akademie v. Wetenschappen, 49, 758-764 (1946) · Zbl 0060.02701
[6] Buhrman, Harry; Fortnow, Lance; Laplante, Sophie, Resource-bounded Kolmogorov complexity revisited, (Proceedings of the 14th Symposium on Theoretical Aspects of Computer Science (1997), Springer), 105-116 · Zbl 1017.68061
[7] Calude, Cristian; Salomaa, Kai; Roblot, Tania, Finite-state complexity and the size of transducers, Proceedings of Descriptional Complexity of Formal Systems (DCFS), 38-47 (2010) · Zbl 1455.68084
[8] Chaitin, Gregory J., A theory of program size formally identical to information theory, Journal of the ACM, 22, 329-340 (1975) · Zbl 0309.68045
[9] Cilibrasi, Rudi; Vitanyi, Paul, Clustering by compression, IEEE Transactions on Information Theory, 51, 4, 1523-1545 (2005) · Zbl 1297.68097
[10] Ehrenfeucht, Andrzej; Mycielski, Jan, A pseudorandom sequence: how random is it? in Richard K. Guy, Unsolved problems, American Mathematical Monthly, 373-375 (1992) · Zbl 1279.94047
[11] Kasai, Toru; Lee, Gunho; Arimura, Hiroki; Arikawa, Setsuo; Park, Kunsoo, Linear-time longest-common-prefix computation in suffix arrays and its applications, (Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (2001), Springer-Verlag), 181-192 · Zbl 0990.68639
[12] Kolmogorov, A. N., Three approaches to the quantitative definition of information, Problems of Information Transmission, 1, 1, 1-7 (1965) · Zbl 0271.94018
[13] Lempel, A.; Ziv, J., On the complexity of finite sequences, IEEE Transactions on Information Theory, 22, 1, 75-81 (1976) · Zbl 0337.94013
[14] María López-Valdés, Lempel-Ziv dimension for Lempel-Ziv compression, in: MFCS, 2006, pp. 693-703.; María López-Valdés, Lempel-Ziv dimension for Lempel-Ziv compression, in: MFCS, 2006, pp. 693-703. · Zbl 1132.68380
[15] Manber, Udi; Myers, Gene, Suffix arrays: a new method for on-line string searches, SIAM Journal on Computing, 22, 5, 935-948 (1993) · Zbl 0784.68027
[16] Schnorr, C. P., Process complexity and effective random tests, Journal of Computer Systems Science, 7, 376-388 (1973) · Zbl 0273.68036
[17] Shallit, Jeffrey; Wang, Ming-Wei, Automatic complexity of strings, Journal of Automata, Languages and Combinatorics, 6, 4, 537-554 (2001) · Zbl 1004.68077
[18] Shannon, C. E., A mathematical theory of communication, Bell System Technical Journal, 27, 379-423 (1948), 623-656 · Zbl 1154.94303
[19] Uspensky, V. A.; Shen, A. Kh., Relations between varieties of Kolmogorov complexities, Mathematics Systems Theory, 29, 271-292 (1996) · Zbl 0849.68059
[20] Vitányi, Paul; Li, Ming, An Introduction to Kolmogorov Complexity and Its Applications (1997), Springer · Zbl 0866.68051
[21] Ziv, Jacob; Lempel, Abraham, Compression of individual sequences via variable-rate coding, IEEE Transactions on Information Theory, 24, 5, 530-536 (1978) · Zbl 0392.94004
[22] Zvonkin, A. K.; Levin, L. A., The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms, Russian Mathematical Surveys, 25, 83-124 (1970) · Zbl 0222.02027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.