×

Rényi entropy and pattern matching for run-length encoded sequences. (English) Zbl 1469.60094

Summary: In this note, we studied the asymptotic behaviour of the length of the longest common substring for run-length encoded sequences. When the original sequences are generated by an \(\alpha \)-mixing process with exponential decay (or \(\psi\) mixing with polynomial decay), we proved that this length grows logarithmically with a coefficient depending on the Rényi entropy of the pushforward measure. For Bernoulli processes and Markov chains, this coefficient is computed explicitly.

MSC:

60F15 Strong limit theorems
60G10 Stationary stochastic processes
94A17 Measures of information, entropy
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
37A25 Ergodicity, mixing, rates of mixing
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)

References:

[1] Abadi, M., Cardeño, L., and Gallo, S. Potential well spectrum and hitting time in renewal processes.J. Stat. Phys.,159(5), 1087-1106 (2015) · Zbl 1329.82015
[2] Abadi, M., Gallo, S., and Rada-Mora, E. A. The shortest possible return time ofβ-mixing processes.IEEE Trans. Inform. Theory,64(7), 4895-4906 (2018) · Zbl 1401.60054
[3] Abadi, M. N. and Cardeño, L.Rényi entropies and large deviations for the first match function.IEEE Trans. Inform. Theory,61(4), 1629-1639 (2015) · Zbl 1359.94289
[4] Ahsan, S. B., Aziz, S. P., and Rahman, M. S. Longest common subsequence problem for run-length-encoded strings. In2012 15th International Conference on Computer and Information Technology (ICCIT), pp. 36-41 (2012) · doi:10.1109/IC-
[5] Apostolico, A., Landau, G. M., and Skiena, S. Matching for run-length encoded strings.J. Complexity,15(1), 4-16 (1999) · Zbl 0921.68041
[6] Barros, V., Liao, L., and Rousseau, J. On the shortest distance between orbits and the longest common substring problem.Adv. Math.,344, 311-339 (2019) · Zbl 1442.37060
[7] Barros, V. and Rousseau, J. Shortest Distance Between Multiple Orbits and Generalized Fractal Dimensions.Ann. Henri Poincaré(2021) · Zbl 1478.37035 · doi:10.1007/s00023-
[8] Bowen, R.Equilibrium states and the ergodic theory of Anosov diffeomorphisms. Lecture Notes in Mathematics, Vol. 470. Springer-Verlag, Berlin-New York (1975) · Zbl 0308.28010
[9] Bradley, R. C.Introduction to strong mixing conditions. Vol. 2. Kendrick Press, Heber City, UT (2007). ISBN 0-9740427-7-3 · Zbl 1134.60004
[10] Chen, K.-Y., Hsu, P.-H., and Chao, K.-M. Hardness of comparing two run-length encoded strings.J. Complexity,26(4), 364-374 (2010) · Zbl 1193.94077
[11] Ciuperca, G., Girardin, V., and Lhote, L. Computation and estimation of generalized entropy rates for denumerable Markov chains.IEEE Trans. Inform. Theory, 57(7), 4026-4034 (2011) · Zbl 1365.94139
[12] Collet, P., Giardina, C., and Redig, F. Matching with shift for one-dimensional Gibbs measures.Ann. Appl. Probab.,19(4), 1581-1602 (2009) · Zbl 1171.60390
[13] Coutinho, A., Lambert, R., and Rousseau, J. Matching strings in encoded sequences.Bernoulli,26(3), 2021-2050 (2020) · Zbl 1442.94028
[14] D¸ebowski, Ł. On the vocabulary of grammar-based codes and the logical consistency of texts.IEEE Trans. Inform. Theory,57(7), 4589-4599 (2011) · Zbl 1365.68430
[15] D¸ebowski, Ł. Maximal repetition and zero entropy rate.IEEE Trans. Inform. Theory,64(4, part 1), 2212-2219 (2018) · Zbl 1390.94620
[16] Doukhan, P.Mixing. Properties and examples, volume 85 ofLecture Notes in Statistics. Springer-Verlag, New York (1994). ISBN 0-387-94214-9 · Zbl 0801.60027
[17] Fan, S., Liao, L., and Qiu, Y. Stationary determinantal processes:ψ-mixing property andLq-dimensions.ArXiv Mathematics e-prints(2019)
[18] Freschi, V. and Bogliolo, A. Longest common subsequence between run-lengthencoded strings: a new algorithm with improved parallelism.Inform. Process. Lett.,90(4), 167-173 (2004) · Zbl 1177.68248
[19] Galves, A. and Schmitt, B. Inequalities for hitting times in mixing dynamical systems.Random Comput. Dynam.,5(4), 337-347 (1997) · Zbl 1005.37501
[20] Haydn, N. and Vaienti, S. The Rényi entropy function and the large deviation of short return times.Ergodic Theory Dynam. Systems,30(1), 159-179 (2010) · Zbl 1202.37009
[21] Hinds, S. C., Fisher, J. L., and D’Amato, D. P. A document skew detection method using run-length encoding and the Hough transform. In[1990] Proceedings. 10th International Conference on Pattern Recognition, volume i, pp. 464-468 vol.1 (1990) · doi:10.1109/ICPR.1990.118147
[22] Hooshmand, S., Tavakoli, N., Abedin, P., and Thankachan, S. V. On computing average common substring over run length encoded sequences.Fund. Inform., 163(3), 267-273 (2018) · Zbl 1403.68373
[23] Hunter,R. and Robinson,A. H.International digital facsimile coding standards.Proceedings of the IEEE,68(7),854-867 (1980)
[24] Kontoyiannis, I., Algoet, P. H., Suhov, Y. M., and Wyner, A. J. Nonparametric entropy estimation for stationary processes and random fields, with applications to English text.IEEE Trans. Inform. Theory,44(3), 1319-1327 (1998) · Zbl 1026.94516
[25] Kontoyiannis, I. and Suhov, Y. Prefixes and the entropy rate for long-range sources. InProbability, statistics and optimisation, Wiley Ser. Probab. Math. Statist. Probab. Math. Statist., pp. 89-98. Wiley, Chichester (1994) · Zbl 0855.60034
[26] Łuczak, T. and Szpankowski, W. A suboptimal lossy data compression based on approximate pattern matching.IEEE Trans. Inform. Theory,43(5), 1439-1451 (1997) · Zbl 0953.94012
[27] Miall, A.The geology of stratigraphic sequences. Second edition (2010). ISBN 978-3-642-05026-8 · doi:10.1007/978-3-642-05027-5
[28] Mokkadem, A. Mixing properties of ARMA processes.Stochastic Process. Appl., 29(2), 309-315 (1988) · Zbl 0647.60042
[29] Neuhauser, C. A phase transition for the distribution of matching blocks.Combin. Probab. Comput.,5(2), 139-159 (1996) · Zbl 0865.60027
[30] Rached, Z. Rényi’s Entropy Rate For Discrete Markov Sources (1998). Master of Science Project
[31] Robinson, A. H. and Cherry, C.Results of a prototype television bandwidth compression scheme.Proceedings of the IEEE,55(3), 356-364 (1967)
[32] Rousseau, J. Longest common substring for random subshifts of finite type (2021+). To appear inAnn. Inst. H. Poincaré Probab. Stat
[33] Sayood, K.Introduction to Data Compression. fifth edition (2017). ISBN 978-012-415796-5 · Zbl 0858.94009 · doi:10.1016/C2010-0-69630-1
[34] Walters, P. Ruelle’s operator theorem andg-measures.Trans. Amer. Math. Soc., 214, 375-387 (1975) · Zbl 0331.28013
[35] Waterman, M.Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London (1995). ISBN 9780412993916 · Zbl 0831.92011
[36] Xu, D., Kurani, A. S., Furst, J. D., and Raicu, D. S. Run-length Encoding for Volumetric Texture. InThe 4th IASTED International Conference on Visualization, Imaging, and Image Processing(2004)
[37] Ziv, J. and Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory,IT-23(3), 337-343 (1977) · Zbl 0379.94010
[38] Ziv, J. and Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory,24(5), 530-536 (1978). 1. Introduction2. Longest common substring for RLE sequences3. Examples3.1. Bernoulli process3.2. Markov chain with two states3.3. Markov chain with more than 2 states4. Proof of the main resultsAcknowledgementsReferences · Zbl 0392.94004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.