×

Bayesian comparison of latent variable models: conditional versus marginal likelihoods. (English) Zbl 1431.62551

Summary: Typical Bayesian methods for models with latent variables (or random effects) involve directly sampling the latent variables along with the model parameters. In high-level software code for model definitions (using, e.g., BUGS, JAGS, Stan), the likelihood is therefore specified as conditional on the latent variables. This can lead researchers to perform model comparisons via conditional likelihoods, where the latent variables are considered model parameters. In other settings, however, typical model comparisons involve marginal likelihoods where the latent variables are integrated out. This distinction is often overlooked despite the fact that it can have a large impact on the comparisons of interest. In this paper, we clarify and illustrate these issues, focusing on the comparison of conditional and marginal Deviance Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in psychometric modeling. The conditional/marginal distinction corresponds to whether the model should be predictive for the clusters that are in the data or for new clusters (where “clusters” typically correspond to higher-level units like people or schools). Correspondingly, we show that marginal WAIC corresponds to leave-one-cluster out cross-validation, whereas conditional WAIC corresponds to leave-one-unit out. These results lead to recommendations on the general application of the criteria to models with latent variables.

MSC:

62P15 Applications of statistics to psychology
62B10 Statistical aspects of information-theoretic topics

References:

[1] Celeux, G., Forbes, F., Robert, C. P., & Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1(4), 651-673. · Zbl 1331.62329
[2] daSilva, M. A., Bazán, J. L., & Huggins-Manley, A. C. (2019). Sensitivity analysis and choosing between alternative polytomous IRT models using Bayesian model comparison criteria. Communications in Statistics-Simulation and Computation, 48(2), 601-620. · Zbl 07551455
[3] De Boeck, P. (2008). Random item IRT models Random item IRT models. Psychometrika, 73, 533-559. · Zbl 1284.62699
[4] Denwood, M. J. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1-25. 10.18637/jss.v071.i09.
[5] Efron, B. (1986). How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81, 461-470. · Zbl 0621.62073
[6] Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. New York, NY: Springer. · Zbl 1271.62012
[7] Furr, D. C. (2017). Bayesian and frequentist cross-validation methods for explanatory item response models. (Unpublished doctoral dissertation). University of California Berkeley, CA.
[8] Gelfand, A. E., Sahu, S. K., & Carlin, B. P. (1995). Efficient parametrisations for normal linear mixed models. Biometrika, 82, 379-488. · Zbl 0832.62064
[9] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., Rubin, D. B., et al. (2013). Bayesian data analysis (3rd ed.). New York: Chapman & Hall/CRC.
[10] Gelman, A., Hwang, J., & Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997-1016. · Zbl 1332.62090
[11] Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2, 1360-1383. · Zbl 1156.62017
[12] Gelman, A., Meng, X. L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733-807. · Zbl 0859.62028
[13] Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457-511. · Zbl 1386.65060
[14] Gronau, Q. F., & Wagenmakers, E. J. (2018). Limitations of Bayesian leave-one-out cross-validation for model selection. Computational Brain & Behavior, 2(1), 1-11.
[15] Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14, 382-417. · Zbl 1059.62525
[16] Kang, T., Cohen, A. S., & Sung, H. J. (2009). Model selection indices for polytomous items. Applied Psychological Medicine, 35, 499-518.
[17] Kaplan, D. (2014). Bayesian statistics for the social sciences. New York, NY: The Guildford Press.
[18] Lancaster, T. (2000). The incidental parameter problem since 1948. Journal of Econometrics, 95, 391-413. · Zbl 0967.62099
[19] Levy, R., & Mislevy, R. J. (2016). Bayesian psychometric modeling. Boca Raton, FL: Chapman & Hall. · Zbl 1337.62003
[20] Li, F., Cohen, A. S., Kim, S. H., & Cho, S. J. (2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33, 353-373.
[21] Li, L., Qui, S., & Feng, C. X. (2016). Approximating cross-validatory predictive evaluation in Bayesian latent variable models with integrated IS and WAIC. Statistics and Computing, 26, 881-897. · Zbl 1505.62248
[22] Lu, Z. H., Chow, S. M., & Loken, E. (2017). A comparison of Bayesian and frequentist model selection methods for factor analysis models. Psychological Methods, 22(2), 361-381.
[23] Lunn, D., Jackson, C., Best, N., Thomas, A., & Spiegelhalter, D. (2012). The BUGS book: A practical introduction to Bayesian analysis. New York, NY: Chapman & Hall/CRC.
[24] Lunn, D., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS—a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325-337.
[25] Luo, U., & Al-Harbi, K. (2017). Performances of LOO and WAIC as IRT model selection methods. Psychological Test and Assessment Modeling, 59, 183-205.
[26] Marshall, E. C., & Spiegelhalter, D. J. (2007). Identifying outliers in Bayesian hierarchical models: A simulation-based approach. Bayesian Analysis, 2(2), 409-444. · Zbl 1331.62032
[27] McElreath, R. (2015). Statistical rethinking: A Bayesian course with examples in R and Stan. New York, NY: Chapman & Hall/CRC.
[28] Merkle, E. C., & Rosseel, Y. (2018). blavaan: Bayesian structural equation models via parameter expansion. Journal of Statistical Software, 85(4), 1-30.
[29] Millar, R. B. (2009). Comparison of hierarchical Bayesian models for overdispersed count data using DIC and Bayes’ factors. Biometrics, 65, 962-969. · Zbl 1172.62054
[30] Millar, R. B. (2018). Conditional vs. marginal estimation of predictive loss of hierarchical models using WAIC and cross-validation. Statistics and Computing, 28, 375-385. · Zbl 1384.62093
[31] Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195. · Zbl 0596.62114
[32] Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17, 313-335.
[33] Navarro, D. (2018). Between the devil and the deep blue sea: Tensions between scientific judgement and statistical model selection. Computational Brain & Behavior, 2(1), 28-34.
[34] Naylor, J. C., & Smith, A. F. (1982). Applications of a method for the efficient computation of posterior distributions. Journal of the Royal Statistical Society C (Applied Statistics), 31, 214-225. · Zbl 0521.65017
[35] Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16, 1-32. · Zbl 0034.07602
[36] O’Hagan, A. (1976). On posterior joint and marginal modes. Biometrika, 63, 329-333. · Zbl 0332.62030
[37] Piironen, J., & Vehtari, A. (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27, 711-735. · Zbl 1505.62321
[38] Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational Graphics and Statistics, 4, 12-35.
[39] Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In K. Hornik, Leisch, F. & Zeileis, A. (Eds.), Proceedings of the 3rd international workshop on distributed statistical computing.
[40] Plummer, M. (2008). Penalized loss functions for Bayesian model comparison. Biostatistics, 9(3), 523-539. · Zbl 1143.62003
[41] Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301-323. · Zbl 1336.62079
[42] Raftery, A. E., & Lewis, S. M. (1995). The number of iterations, convergence diagnostics, and generic Metropolis algorithms. London: Chapman and Hall.
[43] Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. · Zbl 1001.62004
[44] Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1-36.
[45] Song, X. Y., & Lee, S. Y. (2012). Basic and advanced Bayesian structural equation modeling: With applications in the medical and behavioral sciences. Chichester, UK: Wiley. · Zbl 1282.62056
[46] Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series, B64, 583-639. · Zbl 1067.62010
[47] Spielberger, C. (1988). State-trait anger expression inventory research edition [Computer software manual]. FL: Odessa.
[48] Stan Development Team. (2014). Stan modeling language users guide and reference manual, version 2.5.0 [Computer software manual]. http://mc-stan.org/.
[49] Trevisani, M., & Gelfand, A. E. (2003). Inequalities between expected marginal log-likelihoods, with implications for likelihood-based model complexity and comparison measures. The Canadian Journal of Statistics, 31, 239-250. · Zbl 1042.62027
[50] Vansteelandt, K. (2000). Formal models for contextualized personality psychology (Unpublished doctoral dissertation). Belgium: University of Leuven Leuven.
[51] Vehtari, A., Gelman, A., & Gabry, J. (2016). loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models. R package version 0.1.6. https://github.com/stan-dev/loo. · Zbl 1505.62408
[52] Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413-1432. · Zbl 1505.62408
[53] Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., & Winther, O. (2016). Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models. Journal of Machine Learning Research, 17, 1-38. · Zbl 1370.62020
[54] Vehtari, A., Simpson, D. P., Yao, Y., & Gelman, A. (2018). Limitations of “Limitations of Bayesian leave-one-out cross-validation for model selection”. Computational Brain & Behavior, 2(1), 22-27.
[55] Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571-3594. · Zbl 1242.62024
[56] White, I. R. (2010). simsum: Analyses of simulation studies including Monte Carlo error. The Stata Journal, 10, 369-385.
[57] Wicherts, J. M., Dolan, C. V., & Hessen, D. J. (2005). Stereotype threat and group differences in test performance: A question of measurement invariance. Journal of Personality and Social Psychology, 89(5), 696-716.
[58] Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2018). Using stacking to average Bayesian predictive distributions (with discussion). Bayesian Analysis, 13, 917-1007. https://doi.org/10.1214/17-BA1091. · Zbl 1407.62090 · doi:10.1214/17-BA1091
[59] Zhang, X., Tao, J., Wang, C., & Shi, N. Z. (2019). Bayesian model selection methods for multilevel IRT models: A comparison of five DIC-based indices. Journal of Educational Measurement, 56, 3-27.
[60] Zhao, Z., & Severini, T. A. (2017). Integrated likelihood computation methods. Computational Statistics, 32, 281-313. · Zbl 1417.65059
[61] Zhu, X., & Stone, C. A. (2012). Bayesian comparison of alternative graded response models for performance assessment applications. Educational and Psychological Measurement, 7(2), 5774-799.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.