×

An empirical study of minimum description length model selection with infinite parametric complexity. (English) Zbl 1098.62008

Summary: Parametric complexity is a central concept in Minimum Description Length (MDL) model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on Jeffreys’ prior cannot be used. Several ways to resolve this problem have been proposed. We conduct experiments to compare and evaluate their behaviour on small sample sizes. We find interestingly poor behaviour for the plug-in predictive code; a restricted NML model performs quite well but it is questionable if the results validate its theoretical motivation. A Bayesian marginal distribution with Jeffreys’ prior can still be used if one sacrifices the first observation to make a proper posterior; this approach turns out to be most dependable.

MSC:

62B10 Statistical aspects of information-theoretic topics
62A01 Foundations and philosophical topics in statistics
62F15 Bayesian inference
62C10 Bayesian problems; characterization of Bayes procedures
Full Text: DOI

References:

[1] Allen, T. V., Madani, O., & Greiner, R. (2003). Comparing model selection criteria for belief networks, submitted for publication.; Allen, T. V., Madani, O., & Greiner, R. (2003). Comparing model selection criteria for belief networks, submitted for publication.
[2] Balasubramanian, V., Statistical inference, Occam’s Razor, and statistical mechanics on the space of probability distributions, Neural Computation, 9, 349-368 (1997) · Zbl 0870.62006
[3] Barron, A.; Rissanen, J.; Yu, B., The minimum description length principle in coding and modeling, IEEE Transactions on Information Theory, 44, 6, 2743-2760 (1998), (Special Commemorative Issue: Information Theory: 1948-1998) · Zbl 0933.94013
[4] Berger, J. (2004). Personal communication.; Berger, J. (2004). Personal communication.
[5] Berger, J. O., & Pericchi, L. R. (1997). Objective Bayesian methods for model selection: Introduction and comparison. Institute of Mathematical Statistics Lecture NotesMonograph series; Berger, J. O., & Pericchi, L. R. (1997). Objective Bayesian methods for model selection: Introduction and comparison. Institute of Mathematical Statistics Lecture NotesMonograph series
[6] Bernardo, J.; Smith, A. F., Bayesian theory (1994), Wiley: Wiley New York · Zbl 0796.62002
[7] Chater, N., A minimum description length principle for perception, (Grünwald, P. D.; Myung, I. J.; Pitt, M. A., Advances in minimum description length: Theory and applications (2005), MIT Press: MIT Press Cambridge, MA) · Zbl 0029.24301
[8] Clarke, B.; Barron, A., Information theoretic asymptotics of Bayes methods, IEEE Transactions on Information Theory, 36, 453-471 (1990) · Zbl 0709.62008
[9] Clarke, B.; Barron, A., Jeffreys’ prior is asymptotically least favourable under entropy risk, The Journal of Statistical Planning and Inference, 41, 37-60 (1994) · Zbl 0820.62006
[10] Cover, T.; Thomas, J., Elements of information theory (1991), Wiley Interscience: Wiley Interscience New York · Zbl 0762.94001
[11] Dawid, A., Present position and potential developments: Some personal views, statistical theory, the prequential approach, Journal of the Royal Statistical Society, Series A, 147, 2, 278-292 (1984) · Zbl 0557.62080
[12] Gerensce’r, L. (1987). Order estimation of stationary Gaussian ARMA processes using Rissanens complexity; Gerensce’r, L. (1987). Order estimation of stationary Gaussian ARMA processes using Rissanens complexity
[13] Grünwald, P. (2000). Model selection based on minimum description length. Journal of Mathematical Psychology; Grünwald, P. (2000). Model selection based on minimum description length. Journal of Mathematical Psychology · Zbl 0968.62008
[14] Grünwald, P., & de Rooij, S. (2005). Asymptotic log-loss of prequential maximum likelihood codes. In Proceedings of the 18th annual conference on learning theoryCOLT 2005; Grünwald, P., & de Rooij, S. (2005). Asymptotic log-loss of prequential maximum likelihood codes. In Proceedings of the 18th annual conference on learning theoryCOLT 2005 · Zbl 1137.68539
[15] Grünwald, P. D., MDL tutorial, (Grünwald, P. D.; Myung, I. J.; Pitt, M. A., Advances in minimum description length: Theory and applications (2005), MIT Press: MIT Press Cambridge, MA)
[16] Grünwald, P. D., Myung, I. J., & Pitt, M. A. (Eds.) (2005). Advances in minimum description lengthTheory and applications; Grünwald, P. D., Myung, I. J., & Pitt, M. A. (Eds.) (2005). Advances in minimum description lengthTheory and applications
[17] Jeffreys, H., Theory of probability (1961), Oxford University Press: Oxford University Press London · Zbl 0116.34904
[18] Kontkanen, P.; Myllymäki, P.; Tirri, H., Comparing prequential model selection criteria in supervised learning of mixture models, (Jaakkola, T.; Richardson, T., Proceedings of the eighth international conference on artificial intelligence and statistics (2001), Morgan Kaufman: Morgan Kaufman Los Altos, CA), 233-238
[19] Lanterman, A. D., Hypothesis testing for Poisson versus geometric distributions using stochastic complexity, (Grünwald, P. D.; Myung, I. J.; Pitt, M. A., Advances in minimum description length: Theory and applications (2005), MIT Press: MIT Press Cambridge, MA)
[20] Lee, M.; Navarro, D., Minimum description length and psychological clustering models, (Grünwald, P.; Myung, I.; Pitt, M., Advances in minimum description length: Theory and applications (2005), MIT Press: MIT Press Cambridge, MA)
[21] Liang, F.; Barron, A., Exact minimax predictive density estimation and MDL, (Grünwald, P. D.; Myung, I. J.; Pitt, M. A., Advances in minimum description length: Theory and applications (2005), MIT Press: MIT Press Cambridge, MA)
[22] Modha, D. S.; Masry, E., Prequential and cross-validated regression estimation, Machine Learning, 33, 1 (1998) · Zbl 0923.62046
[23] Myung, I.; Balasubramanian, V.; Pitt, M., Counting probability distributions: Differential geometry and model selection, Proceedings of the National Academy of Sciences USA, 97, 11170-11175 (2000) · Zbl 0997.62099
[24] Myung, I. J., Pitt, M. A., Zhang, S., & Balasubramanian, V. (2001). The use of MDL to select among computational models of cognition. In Advances in neural information processing systems; Myung, I. J., Pitt, M. A., Zhang, S., & Balasubramanian, V. (2001). The use of MDL to select among computational models of cognition. In Advances in neural information processing systems
[25] Navarro, D., A note on the applied use of MDL approximations, Neural Computation, 16 (2004) · Zbl 1050.62112
[26] Pitt, M., Myung, I., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review; Pitt, M., Myung, I., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review
[27] Rissanen, J., Modeling by the shortest data description, Automatica, 14, 465-471 (1978) · Zbl 0418.93079
[28] Rissanen, J., Universal coding, information, prediction and estimation, IEEE Transactions on Information Theory, 30, 629-636 (1984) · Zbl 0574.62003
[29] Rissanen, J., A predictive least squares principle, IMA Journal of Mathematical Control and Information, 3, 211-222 (1986) · Zbl 0626.93069
[30] Rissanen, J., Stochastic complexity in statistical inquiry (1989), World Scientific Publishing Company: World Scientific Publishing Company Singapore · Zbl 0800.68508
[31] Rissanen, J., Fisher information and stochastic complexity, IEEE Transactions on Information Theory, 42, 1, 40-47 (1996) · Zbl 0856.94006
[32] Rissanen, J., MDL denoising, IEEE Transactions on Information Theory, 46, 7, 2537-2543 (2000) · Zbl 1005.94522
[33] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 2, 461-464 (1978) · Zbl 0379.62005
[34] Wagenmakers, E., Grünwald, P., & Steyvers, M. (2006). Accumulative prediction error and the selection of time series models. Journal of Mathematical Psychology; Wagenmakers, E., Grünwald, P., & Steyvers, M. (2006). Accumulative prediction error and the selection of time series models. Journal of Mathematical Psychology · Zbl 1099.62103
[35] Wei, C., On predictive least squares principles, Annals of Statistics, 20, 1, 1-42 (1990) · Zbl 0801.62083
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.