×

Context tree selection: a unifying view. (English) Zbl 1397.60130

Summary: Context tree models have been introduced by J. Rissanen [IEEE Trans. Inf. Theory 29, 656–664 (1983; Zbl 0521.94010)] as a parsimonious generalization of Markov models. Since then, they have been widely used in applied probability and statistics. The present paper investigates non-asymptotic properties of two popular procedures of context tree estimation: Rissanen’s algorithm Context and penalized maximum likelihood. First showing how they are related, we prove finite horizon bounds for the probability of over- and under-estimation. Concerning over-estimation, no boundedness or loss-of-memory conditions are required: the proof relies on new deviation inequalities for empirical probabilities of independent interest. The under-estimation properties rely on classical hypotheses for processes of infinite memory. These results improve on and generalize the bounds obtained in [D. Duarte et al., Bull. Braz. Math. Soc. (N.S.) 37, No. 4, 581–592 (2006; Zbl 1110.60092); A. Galves et al., ESAIM, Probab. Stat. 12, 219–229 (2008; Zbl 1182.62165); A. Galves and the second author, Prog. Probab. 60, 257–269 (2008; Zbl 1151.62343); the second author, Braz. J. Probab. Stat. 24, No. 2, 321–336 (2010; Zbl 1192.62193)], refining asymptotic results of P. Bühlmann and A. J. Wyner [Ann. Stat. 27, No. 2, 480–513 (1999; Zbl 0983.62048)] and I. Csiszár and Z. Talata [IEEE Trans. Inf. Theory 52, No. 3, 1007–1016 (2006; Zbl 1284.94027)].

MSC:

60K99 Special processes
60G10 Stationary stochastic processes
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)
62M09 Non-Markovian processes: estimation
94A55 Shift register sequences and sequences over finite alphabets in information and communication theory

References:

[1] Barron, A.; Rissanen, J.; Yu, B., The minimum description length principle in coding and modeling, IEEE Trans. Inform. Theory, 44, 6, 2743-2760 (1998), Information theory: 1948-1998 · Zbl 0933.94013
[2] Bejerano, G.; Yona, G., Variations on probabilistic suffix trees: statistical modeling and prediction of protein families, Bioinformatics, 17, 1, 23-43 (2001)
[3] Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., (Classification and Regression Trees. Classification and Regression Trees, Wadsworth Statistics/Probability Series (1984), Wadsworth Advanced Books and Software: Wadsworth Advanced Books and Software Belmont, CA) · Zbl 0541.62042
[4] Bühlmann, P.; Wyner, A. J., Variable length Markov chains, Ann. Statist., 27, 480-513 (1999) · Zbl 0983.62048
[5] Busch, J. R.; Ferrari, P. A.; Flesia, A. G.; Fraiman, R.; Grynberg, S. P.; Leonardi, F., Testing statistical hypothesis on random trees and applications to the protein classification problem, Ann. Appl. Stat., 3, 2 (2009) · Zbl 1166.62079
[6] Cesa-Bianchi, N.; Lugosi, G., Prediction, Learning, and Games (2006), Cambridge University Press: Cambridge University Press Cambridge · Zbl 1114.91001
[7] Comets, F.; Fernández, R.; Ferrari, P., Processes with long memory: regenerative construction and perfect simulation, Ann. Appl. Probab., 12, 3, 921-943 (2002) · Zbl 1016.60061
[8] Csiszár, I., Large-scale typicality of Markov sample paths and consistency of MDL order estimators, IEEE Trans. Inform. Theory, 48, 6, 1616-1628 (2002), Special issue on Shannon theory: perspective, trends, and applications · Zbl 1060.62092
[9] Csiszár, I.; Talata, Z., Context tree estimation for not necessarily finite memory processes, via BIC and MDL, IEEE Trans. Inform. Theory, 52, 3, 1007-1016 (2006) · Zbl 1284.94027
[10] Csiszár, I.; Talata, Z., On rate of convergence of statistical estimation of stationary ergodic processes, IEEE Trans. Inform. Theory, 56, 8, 3637-3641 (2010) · Zbl 1366.62168
[11] Dedecker, J.; Prieur, C., New dependence coefficients. Examples and applications to statistics, Probab. Theory Related Fields, 132, 203-236 (2005) · Zbl 1061.62058
[12] Duarte, D.; Galves, A.; Garcia, N. L., Markov approximation and consistent estimation of unbounded probabilistic suffix trees, Bull. Braz. Math. Soc., 37, 4, 581-592 (2006) · Zbl 1110.60092
[13] Fernández, R.; Galves, A., Markov approximations of chains of infinite order, Bull. Braz. Math. Soc., 33, 3, 295-306 (2002) · Zbl 1035.60033
[14] S. Filippi, O. Cappé, A. Garivier, Optimism in reinforcement learning based on Kullback-Leibler divergence, in: 48th Annual Allerton Conference on Communication, Control, and Computing, 2010.; S. Filippi, O. Cappé, A. Garivier, Optimism in reinforcement learning based on Kullback-Leibler divergence, in: 48th Annual Allerton Conference on Communication, Control, and Computing, 2010.
[15] A. Galves, C. Galves, J. Garcia, N.L. Garcia, F. Leonardi, Context tree selection and linguistic rhythm retrieval from written texts, 2010, pp. 1-25. ArXiv:0902.3619; A. Galves, C. Galves, J. Garcia, N.L. Garcia, F. Leonardi, Context tree selection and linguistic rhythm retrieval from written texts, 2010, pp. 1-25. ArXiv:0902.3619 · Zbl 1235.68305
[16] A. Galves, A. Garivier, E. Gassiat, Data selection of context trees and classification, Technical Report, 2010.; A. Galves, A. Garivier, E. Gassiat, Data selection of context trees and classification, Technical Report, 2010. · Zbl 1328.62504
[17] Galves, A.; Leonardi, F., (Exponential Inequalities for Empirical Unbounded Context Trees. Exponential Inequalities for Empirical Unbounded Context Trees, Progress in Probability, vol. 60 (2008), Birkhauser), 257-270 · Zbl 1151.62343
[18] Galves, A.; Maume-Deschamps, V.; Schmitt, B., Exponential inequalities for VLMC empirical trees, ESAIM Probab. Stat., 12, 43-45 (2008) · Zbl 1182.62165
[19] Garivier, A., Consistency of the unlimited BIC context tree estimator, IEEE Trans. Inform. Theory, 52, 10, 4630-4635 (2006) · Zbl 1320.62014
[20] A. Garivier, O. Cappé, The KL-UCB algorithm for bounded stochastic bandits and beyond, in: 23rd Conf. Learning Theory, COLT, Budapest, Hungary, 2011.; A. Garivier, O. Cappé, The KL-UCB algorithm for bounded stochastic bandits and beyond, in: 23rd Conf. Learning Theory, COLT, Budapest, Hungary, 2011.
[21] A. Garivier, E. Moulines, On upper-confidence bound policies for non-stationary bandit problems, 2008. arxiv.org:0805.3415; A. Garivier, E. Moulines, On upper-confidence bound policies for non-stationary bandit problems, 2008. arxiv.org:0805.3415 · Zbl 1349.60070
[22] Leonardi, F., Some upper bounds for the rate of convergence of penalized likelihood context tree estimators, Braz. J. Probab. Stat., 24, 2, 321-336 (2010) · Zbl 1192.62193
[23] Massart, P., (Concentration Inequalities and Model Selection. Concentration Inequalities and Model Selection, Lecture Notes in Mathematics, vol. 1896 (2007), Springer: Springer Berlin), Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003, With a foreword by Jean Picard · Zbl 1170.60006
[24] Neveu, J., Martingales à Temps Discret (1972), Masson
[25] Rissanen, J., A universal data compression system, IEEE Trans. Inform. Theory, 29, 5, 656-664 (1983) · Zbl 0521.94010
[26] Z. Talata, T. Duncan, Unrestricted BIC context tree estimation for not necessarily finite memory processes, in: Information Theory, 2009. ISIT 2009. IEEE International Symposium on, 28 2009-July 3 2009, pp. 724-728.; Z. Talata, T. Duncan, Unrestricted BIC context tree estimation for not necessarily finite memory processes, in: Information Theory, 2009. ISIT 2009. IEEE International Symposium on, 28 2009-July 3 2009, pp. 724-728.
[27] Willems, F. M.J.; Shtarkov, Y. M.; Tjalkens, T. J., The context-tree weighting method: basic properties, IEEE Trans. Inf. Theory, 41, 3, 653-664 (1995) · Zbl 0837.94011
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.