×

The importance of complexity in model selection. (English) Zbl 0946.62094

Summary: Model selection should be based not solely on goodness-of-fit, but must also consider model complexity. While the goal of mathematical modeling in cognitive psychology is to select one model from a set of competing models that best captures the underlying mental process, choosing the model that best fits a particular set of data will not achieve this goal. This is because a highly complex model can provide a good fit without necessarily bearing any interpretable relationship with the underlying process.
It is shown that model selection based solely on the fit to observed data will result in the choice of an unnecessarily complex model that overfits the data, and thus generalizes poorly. The effect of over-fitting must be properly offset by model selection methods. An application example of selection methods using artificial data is also presented.

MSC:

62P15 Applications of statistics to psychology
91E10 Cognitive psychology
62B10 Statistical aspects of information-theoretic topics

References:

[1] Akaike, H., Information theory and an extension of the maximum likelihood principle, (Petrox, B. N.; Caski, F., Second international symposium on information theory (1973), Akademiai Kiado: Akademiai Kiado Budapest), 267 · Zbl 0283.62006
[2] Bozdogan, H., On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models, Communications in Statistics Theory Methods, 19, 221-278 (1990) · Zbl 0900.62041
[3] Bozdogan, H., Akaike information criterion and recent developments in information complexity, Journal of Mathematical Psychology, 44, 62-91 (2000) · Zbl 1047.62501
[4] Bozdogan, H.; Bearse, P. M., Model selection using informational complexity with applications to vector autoregressive (VAR) models, (Dowe, D., Information, statistics and induction in sciences (ISIS) anthology (1997), Springer-Verlag: Springer-Verlag Berlin/New York)
[5] Bozdogan, H, &, Haughton, D. M. A. 1997, Information complexity criteria for regression models. Unpublished manuscript.; Bozdogan, H, &, Haughton, D. M. A. 1997, Information complexity criteria for regression models. Unpublished manuscript. · Zbl 1042.62504
[6] Browne, M. W., Cross-validation methods, Journal of Mathematical Psychology, 44, 108-132 (2000) · Zbl 0946.62045
[7] Browne, M. W.; Cudeck, R. C., Alternative ways of assessing model fit, Sociological Methods & Research, 21, 230-258 (1992)
[8] Collyer, C. E., Comparing strong and weak models by fitting them to computer-generated data, Perception & Psychophysics, 38, 476-481 (1985)
[9] Efron, B.; Hinkley, D. V., Asessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information, Biometrika, 65, 457-487 (1978) · Zbl 0401.62002
[10] Friedman, E.; Massaro, D. W.; Kitzis, S. N.; Cohen, M. M., A comparison of learning models, Journal of Mathematical Psychology, 39, 164-178 (1995) · Zbl 0840.92030
[11] Grunwald, P., Model selection based on minimum description length, Journal of Mathematical Psychology, 44, 133-152 (2000) · Zbl 0968.62008
[12] Hanna, J. F., Some information measures for testing stochastic models, Journal of Mathematical Psychology, 6, 294-311 (1968) · Zbl 0176.50302
[13] Kass, R. E.; Raftery, A. E., Bayes factors, Journal of the American Statistical Association, 90, 773-795 (1995) · Zbl 0846.62028
[14] Kass, R. E.; Wasserman, L., The selection of prior distributions by formal rules, Journal of the American Statistical Association, 91, 1343-1370 (1996) · Zbl 0884.62007
[15] Kolmogorov, A. N., Logical basis for information theory and probability theory, IEEE Transaction on Information Theory, 14, 662-664 (1968) · Zbl 0167.47601
[16] Myung, I. J.; Pitt, M. A., Applying Occam’s razor in modeling cognition: A Bayesian approach, Psychonomic Bulletin & Review, 4, 79-95 (1997)
[17] Myung, I. J.; Pitt, M. A., Issues in selecting mathematical models of cognition, (Grainger, J.; Jacobs, A. M., Localist Connectionist Approaches to Human Cognition (1998), Erlbaum: Erlbaum Hillsdale), 327-355
[18] Nishii, R., Asymptotic properties of criteria for selection of variables in multiple regression, Annals of Statistics, 12, 758-765 (1984) · Zbl 0544.62063
[19] Raftery, A. E., Bayesian model selection in structural equation models, (Bollen, K. A.; Long, J. S., Testing structural equation models (1993), Sage: Sage Thousand Oaks), 163-180
[20] Rissanen, J., A universal prior for integers and estimation by minimum description length, Annals of Statistics, 11, 416-431 (1983) · Zbl 0513.62005
[21] Rissanen, J., Stochastic complexity and modeling, Annals of Statistics, 14, 1080-1100 (1986) · Zbl 0602.62008
[22] Rissanen, J., Stochastic complexity and the MDL principle, Econometric Reviews, 6, 85-102 (1987) · Zbl 0718.62008
[23] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 461-464 (1978) · Zbl 0379.62005
[24] Steiger, J. H., Structural model evaulation and modification: An interval estimation approach, Multivariate Behavioral Research, 25, 173-180 (1990)
[25] Stone, M., Cross-validatory choice and assessment of statistical predictions (with Discussion), Journal of the Royal Statistical Society, Series B, 36, 111-147 (1974) · Zbl 0308.62063
[26] Wasserman, L., Bayesian model selection and model averaging, Journal of Mathematical Psychology, 44, 92-107 (2000) · Zbl 0946.62032
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.