×

Model selection based on minimum description length. (English) Zbl 0968.62008

Summary: We introduce the minimum description length (MDL) principle, a general principle for inductive inference based on the idea that regularities (laws) underlying data can always be used to compress data. We introduce the fundamental concept of MDL, called the stochastic complexity, and we show how it can be used for model selection. We briefly compare MDL-based model selection to other approaches and we informally explain why we may expect MDL to give good results in practical applications.

MSC:

62B10 Statistical aspects of information-theoretic topics
Full Text: DOI

References:

[1] Barron, A., Complexity regularization with application to artificial neural networks, (Roussas, G., Nonparametric functional estimation and related topics (1990), Kluwer Academic: Kluwer Academic Dordrecht), 561-576 · Zbl 0739.62001
[2] Barron, A.; Cover, T., Minimum complexity density estimation, IEEE Transactions on Information Theory, 37, 1034-1054 (1991) · Zbl 0743.62003
[3] Berger, J., Statistical decision theory and Bayesian analysis (1985), Springer-Verlag: Springer-Verlag New York · Zbl 0572.62008
[4] Bozdogan, H., Akaike’s information criterion and recent developments in informational complexity, Journal of Mathematical Psychology, 44, 62-91 (2000) · Zbl 1047.62501
[5] Browne, M., Cross-validation methods, Journal of Mathematical Psychology, 44, 108-132 (2000) · Zbl 0946.62045
[6] Chaitin, G., On the length of programs for computing finite binary sequences: statistical considerations, Journal of the Association for Computing Machinery, 16, 145 (1969) · Zbl 0187.28303
[7] Chater, N., Reconciling simplicity and likelihood principles in perceptual organization, Psychological Review, 103, 566-581 (1996)
[8] Cover, T.; Thomas, J., Elements of information theory (1991), Wiley Interscience: Wiley Interscience New York · Zbl 0762.94001
[9] Dawid, A., Prequential analysis, stochastic complexity and Bayesian inference, (Bernardo, J.; Berger, J.; Dawid, A.; Smith, A., Bayesian statistics (1992), Oxford University Press), 109-125
[10] Dowe, D.; Allison, L.; Dix, T.; Hunter, L.; Wallace, C.; Edgoose, T., Circular clustering of protein dihedral angles by minimum message length, Proceedings Pacific symposium on biocomputing ’96 (1996), World Scientific
[11] Geman, S.; Bienenstock, E.; Doversat, R., Neural networks and the bias/variance dilemma, Neural Computation, 4, 1-58 (1992)
[12] Grünwald, P., A minimum description length approach to grammar inference, (Wermter, G. S.S; Riloff, E., Connectionist, statistical and symbolic approaches to learning for natural language processing (1996), Springer-Verlag: Springer-Verlag Berlin), 203-216
[13] Grünwald, P., The MDL principle and reasoning under uncertainty (1998), University of Amsterdam
[14] Kearns, M.; Mansour, Y.; Ng, A.; Ron, D., An experimental and theoretical comparison of model selection methods, Machine Learning, 27, 7-50 (1997)
[15] Kolmogorov, A., Three approaches to the quantitative definition of information, Problems in Information Transmission, 1, 1-7 (1965)
[16] Kontkanen, P.; Myllymäki, P.; Tirri, H., Comparing Bayesian model class selection criteria by discrete finite mixtures, (Dowe, D.; Korb, K.; Oliver, J., Proceedings of the infor- mation, statistics and induction in science (ISIS) conference (1996), World Scientific: World Scientific Singapore), 364-374
[17] Li, M.; Vitányi, P., An introduction to Kolmogorov complexity and its applications (1997), Springer-Verlag: Springer-Verlag New York · Zbl 0866.68051
[18] Myung, I, &, Pitt, M. Issues in selecting mathematical models of cognition. In, J. Grainger and A. M. Jacobs, (Eds.), Localist Connectionist Approaches to Human Cognition, Hillsdale, NJ, Erlbaum, in press.; Myung, I, &, Pitt, M. Issues in selecting mathematical models of cognition. In, J. Grainger and A. M. Jacobs, (Eds.), Localist Connectionist Approaches to Human Cognition, Hillsdale, NJ, Erlbaum, in press.
[19] Rissanen, J., Modeling by the shortest data description, Automatica, 14, 465-471 (1978) · Zbl 0418.93079
[20] Rissanen, J., Stochastic complexity, Journal of the Royal Statistical Society, B, 49, 223-239 (1987) · Zbl 0654.62008
[21] Rissanen, J., Stochastic complexity in statistical inquiry (1989), World Scientific: World Scientific Singapore · Zbl 0800.68508
[22] Rissanen, J., Fisher information and stochastic complexity, IEEE Transactions on Information Theory, 42, 40-47 (1996) · Zbl 0856.94006
[23] Solomonoff, R., A formal theory of inductive inference, part 1 and part 2, Information and Control, 7, 1-22 (1964) · Zbl 0258.68045
[24] Speed, T.; Yu, B., Model selection and prediction: normal regression, Annals of the Institute of Statistical Mathematics, 45, 35-54 (1993) · Zbl 0774.62093
[25] Vitányi, P.; Li, M., Minimum description length induction, Bayesianism, and Kolmogorov complexity (1997)
[26] Wallace, C., False oracles and SMML estimators, (Dowe, D.; Korb, K.; Oliver, J., Proceedings of the information, statistics and induction in science (ISIS) conference (1996), World Scientific: World Scientific Melbourne), 304-316
[27] Wallace, C.; Boulton, D., An information measure for classification, Computing Journal, 11, 185-195 (1968) · Zbl 0164.46208
[28] Wallace, C.; Freeman, P., Estimation and inference by compact coding, Journal of the Royal Statistical Society B, 49, 240-251 (1987) · Zbl 0653.62005
[29] Wasserman, L., Bayesian model selection and model averaging, Journal of Mathematical Psychology, 44, 92-107 (2000) · Zbl 0946.62032
[30] Wolpert, D. H., The lack of a priori distinctions between learning algorithms, Neural Computation, 8, 1341-1390 (1996)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.