×

Tsallis conditional mutual information in investigating long range correlation in symbol sequences. (English) Zbl 1527.62002

Summary: The presence of long range correlation (LRC) as opposed to Markov chain of a finite order is of primary interest in the study of symbol sequences, such as DNA. Among others, methods using information theory have been developed for Markov chain order estimation. In this work, we consider the Tsallis entropy and define the Tsallis conditional mutual information (TCMI) for Markov chain order estimation, which is more suitable to identify large orders suggesting effectively LRC. The TCMI of order \(m\) for a Tsallis parameter \(q\) is computed for increasing \(m\) and a significance test is performed for each \(m\) until TCMI is found non-significant. Randomization and parametric significance tests for TCMI are developed. For the latter, the null distribution of TCMI is approximated with a gamma distribution deriving analytic expressions for its parameters. We assess the accuracy of order estimation with the two tests for TCMI and compare them to the respective tests using the Shannon entropy. Extended simulations on Markov chains of different orders and structures of the transition probability matrix show that Shannon and Tsallis tests with parameter \(q = 2\) have similar performance for small orders. When the problem becomes more demanding for higher Markov chain orders and LRCs, Tsallis tests for \(q = 2\) have better approximation to the correct order with the sequence length N. Finally, the Tsallis tests are favorably compared to Shannon tests to real DNA sequences.

MSC:

62B10 Statistical aspects of information-theoretic topics
60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

longmemo
Full Text: DOI

References:

[1] Raftery, A. E., A model for high-order Markov chains, J. R. Stat. Soc. Ser. B Stat. Methodol., 47, 3, 528-539 (1985) · Zbl 0593.62091
[2] Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E., Long range correlations in nucleotide sequences, Lett. Nat., 356, 168 (1992)
[3] Shlesinger, M. F.; Zaslavsky, G. M.; Klafter, J., Strange kinetics, Nature, 363, 31 (1993)
[4] Usatenko, O.; Yampolskii, V.; Kechedzhy, K. E.; Melnyk, S. S., Symbolic stochastic dynamical systems viewed as binary n-step markov chains, Phys. Rev. E, 68, Article 061107 pp. (2003)
[5] Melnik, S.; Usatenko, O., Entropy and long-range memory in random symbolic additive Markov chains, Phys. Rev. E, 93, Article 062144 pp. (2016)
[6] Melnik, S.; Usatenko, O., Decomposition of conditional probability for high-order symbolic Markov chains, Phys. Rev. E, 96, Article 012158 pp. (2017)
[7] Tong, H., Determination of the order of a Markov chain by Akaike’s Information Criterion, J. Appl. Probab., 12, 3, 488-497 (1975) · Zbl 0314.62038
[8] Katz, R., On some criteria for estimating the order of a Markov chain, Technometrics, 23, 3, 243-249 (1981) · Zbl 0485.62086
[9] Dalevi, D.; Dubhashi, D., The Peres-Shields order estimator for fixed and variable length Markov Models with applicators to DNA sequence similarity, Lecture Notes in Comput. Sci., 3692, 291-302 (2005)
[10] Menéndez, M.; Pardo, L.; Pardo, M.; Zografos, K., Testing the order of Markov dependence in DNA sequences, Methodol. Comput. Appl. Probab., 13, 59-74 (2011) · Zbl 1207.92012
[11] Pardo, L., Statistical Inference Based on Divergence Measures (2006), Chapman and Hall · Zbl 1118.62008
[12] Baigorri, A.; Gonçalves, C.; Resende, P., Markov Chain order estimation based on \(\chi^2\)-divergence measure, Canad. J. Statist., 56, 3-578 (2014)
[13] Zhao, L.; Dorea, C.; Gonçalves, C., On determination of the order of a Markov chain, Stat. Inference Stoch. Process., 4, 273-282 (2001) · Zbl 1008.62078
[14] Pethel, D.; Hahs, W., Exact test for Markov order, Physica D, 269, 42-47 (2014) · Zbl 1285.62097
[15] Gupta, A.; Hidalgo, J., Order selection and inference with long memory dependent data, J. Time Series Anal., 40, 425-446 (2019) · Zbl 1422.62282
[16] Papapetrou, M.; Kugiumtzis, D., Markov Chain order estimation with conditional mutual information, Physica A, 392, 1593-1601 (2013)
[17] Papapetrou, M.; Kugiumtzis, D., Markov Chain order estimation with parametric significance tests of conditional mutual information, Simul. Model. Practice Theory, 61, 1-13 (2016)
[18] Jund, P.; Kim, S.; Tsallis, C., Crossover from extensive to nonextensive behavior driven by long-range interactions, Phys. Rev. B, 52, 1, 50-53 (1995)
[19] Rohlf, T.; Tsallis, C., Long-range memory elementary 1d cellular automata: Dynamics and nonextensivity, Physica A, 379, 2, 465-470 (2007)
[20] Tsallis, C., Introduction to Nonextensive Statistical Mechanics (2009), Springer: Springer New York · Zbl 1172.82004
[21] Angulo, J.; Esquivel, F., Multifractal dimensional dependence assessment based on tsallis mutual information, Entropy, 17, 8, 5382-5401 (2015)
[22] Ré, M.; Azad, R., Generalization of entropy based divergence measures for symbolic sequence analysis, PLoS One, 9, 4 (2014)
[23] Karakatsanis, L.; Pavlos, G.; Iliopoulos, A.; Pavlos, E.; Clark, P.; Duke, J.; Monos, D., Assessing information content and interactive relationships of subgenomic dna sequences of the mhc using complexity theory approaches based on the non-extensive statistical mechanics, Physica A, 505, 77-93 (2018)
[24] Vila, M.; Bardera, A.; Feixas, M.; Sbert, M., Tsallis mutual information for document classification, Entropy, 13, 9, 1694-1707 (2011) · Zbl 1300.62014
[25] Khader, M.; Hamza, A. B., An information-theoretic method for multimodality medical image registration, Expert Syst. Appl., 39, 5, 5548-5556 (2012)
[26] Cover, T. M.; Thomas, J. A., Elements of Information Theory (1991), Wiley: Wiley London · Zbl 0762.94001
[27] Miller, G. A., Note on the bias of information estimates, (Information Theory in Psychology: Problems and Methods (1955), The Free Press: The Free Press Monticello, IL), 95-100
[28] Roulston, M., Estimating the errors on measured entropy and mutual information, Physica D, 125, 285-294 (1999) · Zbl 0935.94013
[29] Pöschel, T.; Ebeling, W.; Rosé, H., Guessing probability distributions from small samples, J. Stat. Phys., 80, 1443-1452 (1995) · Zbl 1106.62303
[30] Pardo, J. A., Some aplications of the useful mutual information, Appl. Math. Comput., 72, 33-50 (1995) · Zbl 0830.62007
[31] Wolpert, D. H.; Wolf, D. R., Estimating functions of probability distributions from a finite set of samples, Phys. Rev. E, 52, 6841 (1995)
[32] Antos, A.; Kontoyiannis, I., Convergence properties of functional estimates for discrete distributions, Rand. Struct. Alg., 19, 163-193 (2001) · Zbl 0985.62006
[33] Goebel, B.; Dawy, Z.; Hagenauer, J.; Mueller, J., An approximation to the distribution of finite sample size mutual information estimates, IEEE, 2, 1102-1106 (2005)
[34] Hutter, M.; Zaffalon, M., Distribution of mutual information from complete and incomplete data, Comput. Statist. Data Anal., 48, 633-657 (2005) · Zbl 1429.62054
[35] Tsallis, C., Possible generalization of Boltzmann-gibbs statistics, J. Stat. Phys., 52, 479 (1988) · Zbl 1082.82501
[36] Ebeling, W., On the Relation Between Various Entropy Concepts and the Valoric Interpretation (1), 108-120 (1992)
[37] Tsallis, C., Non extensive physics: a possible connection between generalized statistical mechanics and quantum groups, Phys. Lett. A, 195, 229-334 (1994) · Zbl 0941.81565
[38] Tsallis, C., Generalized entropy-based criterion for consistent testing, Phys. Rev. E, 58, 1442-1445 (1998)
[39] Yu, G.-H.; Huang, C.-C., A distribution free plotting position, Stoch. Environ. Res. Risk Assess., 15, 6, 462-476 (2001) · Zbl 0988.62001
[40] Pearson, K., On the Theory of Contingency and Its Relation To Association and Normal Correlation (1904), Dulau and Co.: Dulau and Co. London, Ch. 2 · JFM 36.0313.12
[41] Larsen, R.; Marx, M., An Introduction To Mathematical Statistics and Its Applications (2012), Prentice Hall: Prentice Hall United States, Ch. 46
[42] Gan, X.; Han, Z., Two general models that generate long range correlation, Physica A, 391, 12, 3477-3483 (2012)
[43] Papapetrou, M.; Kugiumtzis, D., Investigating long range correlation in DNA sequences using significance tests of conditional mutual information, Comput. Biol. Chem., 53, 32-42 (2014)
[44] Buldyrev, S. V.; Dokholyan, N. V.; Goldberger, A. L.; Havlin, S.; Peng, C.-K.; Stanley, H. E.; Viswanathan, G. M., Analysis of DNA sequences using methods of statistical physics, Physica A, 249, 430-438 (1998)
[45] Almirantis, Y.; Provata, A., Long and short range correlations in genome organization, J. Stat. Phys., 97, 233-262 (1999) · Zbl 0940.92019
[46] Beran, J.; Feng, Y.; Ghosh, S.; Kulik, R., Long-Memory Processes: Probabilistic Properties and Statistical Methods (2013), Springer · Zbl 1282.62187
[47] Makse, H. A.; Havlin, S.; Schwartz, M.; Stanley, H. E., Method for generating long-range correlations for large systems, Phys. Rev. E, 53, 5445-5449 (1996)
[48] Kugiumtzis, D.; Provata, A., Statistical analysis of gene and intergenic DNA sequences, Physica A, 342, 3-4, 623-638 (2004)
[49] Rosso, O. A.; Martin, M. T.; Plastino, A., Brain electrical activity analysis using wavelet-based informational tools, Physica A, 313, 497-511 (2002) · Zbl 1013.92008
[50] Silva, R., Negative heat capacity and non-extensive kinetic theory, Phys. Lett. A, 313, 393-396 (2003) · Zbl 1023.82506
[51] Sotolongo, C.; Posadas, A., Fragment-asperity interaction model for earthquakes, Phys. Rev. Lett., 92, Article 048501 pp. (2004)
[52] Sneddon, R., The tsallis entropy of natural information, Physica A, 386, 101-118 (2007)
[53] Vilar, C. S.; Franca, G. S.; Silva, R.; Alcaniz, J. S., Nonextensivity in geological faults?, Physica A, 377, 285-290 (2007)
[54] Papadimitriou, C.; Kalimeri, M.; Eftaxias, K., Nonextensivity and universality in the earthquake preparation process, Phys. Rev. E, 77, Article 036101 pp. (2008)
[55] Contoyiannis, Y. F.; Eftaxias, K., Tsallis and levy statistics in the preparation of an earthquake, Nonlinear Process. Geophys., 15, 379-388 (2008)
[56] Kalimeri, M.; Papadimitriou, C.; Balasis, G.; Eftaxias, K., Dynamical complexity detection in pre-seismic emissions using nonadditive tsallis entropy, Physica A, 387, 1161-1172 (2008)
[57] Niven, R.; Suyari, H., Combinatorial basis and non-asymptotic form of the tsallis entropy function, Phys. Condens. Matter, 61, 75-82 (2008) · Zbl 1189.82008
[58] Esquivel, A.; Lazarian, A., Tsallis statistics as a tool for studying interstellar turbulence, Astrophys. J., 710, 125-132 (2010)
[59] Zhang, A.; Bi, J.; Sun, S., A method for drowsiness detection based on tsallis entropy of eeg, IFMBE Proc., 39, 505-508 (2013)
[60] Gamero, L.; Plastino, A.; Torres, M., Wavelet analysis and nonlinear dynamics in a nonextensive setting, Physica A, 246, 487-509 (1997)
[61] Capurro, A.; Suyari, H.; Diambra, L.; Lorenzo, D.; Macadar, O.; Martin, M.; Mostaccio, C.; Plastino, A.; Perez, J.; Rofman, E.; Torres, M.; Velluti, J., Human brain dynamics: the analysis of eeg signals with tsallis information measure, Physica A, 265, 235-254 (1999)
[62] Martin, M. T.; Plastino, A. R.; Plastino, A., Tsallis-like information measures and the analysis of complex signals, Physica A, 275, 262-271 (2000)
[63] Tong, S.; Bezerianos, A.; Paul, J.; Zhu, Y.; Thakor, N., Nonextensive entropy measure of eeg following brain injury from cardiac arrest, Physica A, 305, 619-628 (2002) · Zbl 1039.92025
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.