×

Bayesian model selection for longitudinal count data. (English) Zbl 07610355

Summary: We explore the performance of three popular model-selection criteria for generalised linear mixed-effects models (GLMMs) for longitudinal count data (LCD). We focus on evaluating the conditional criteria (given the random effects) versus the marginal criteria (averaging over the random effects) in selecting the appropriate data-generating model. We advocate the use of marginal criteria, since Bayesian statisticians often use the conditional criteria despite previous warnings. We discuss how to compute the marginal criteria for LCD by a replication method and importance sampling algorithm. Besides, we show via simulations to what extent we err when using the conditional criteria instead of the marginal criteria. To promote the usage of the marginal criteria, we developed an R function that computes the marginal criteria for longitudinal models based on samples from the posterior distribution. Finally, we illustrate the advantages of the marginal criteria on a well-known data set of patients who have epilepsy.

MSC:

62C10 Bayesian problems; characterization of Bayes procedures
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

JAGS

References:

[1] Adrion, C.; Mansmann, U., Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: an example from a vertigo phase III study with longitudinal count data as primary endpoint, BMC Med. Res. Methodol., 12, 137 (2012)
[2] Aregay, M.; Shkedy, Z.; Molenberghs, G., A hierarchical Bayesian approach for the analysis of longitudinal count data with overdispersion: a simulation study, Comput. Stat. Data Anal., 57, 233-245 (2013) · Zbl 1365.62093
[3] Aregay, M.; Shkedy, Z.; Molenberghs, G., Comparison of additive and multiplicative Bayesian models for longitudinal count data with overdispersion parameters: a simulation study, Commun. Stat. Simul. Comput., 44, 454-473 (2015) · Zbl 1328.62157
[4] Ariyo, O.; Lesaffre, E.; Verbeke, G.; Quintero, A., Model selection for Bayesian linear mixed models with longitudinal data: Sensitivity to the choice of priors, Commun. Stat. Simul. Comput., 0, 1-25 (2019) · Zbl 1524.62116
[5] Ariyo, O.; Quintero, A.; Muñoz, J.; Verbeke, G.; Lesaffre, E., Bayesian model selection in linear mixed models for longitudinal data, J. Appl. Stat., 47, 890-913 (2020) · Zbl 1521.62244
[6] Ariyo, O. S. and Adeleke, M. A. (2021). Simultaneous Bayesian modelling of skew-normal longitudinal measurements with non-ignorable dropout. Comput. Stat., 1-23.
[7] Booth, JG; Casella, G.; Friedl, H.; Hobert, JP, Negative binomial loglinear mixed models, Stat. Model., 3, 179-191 (2003) · Zbl 1070.62058
[8] Breslow, N.; Clayton, D., Approximate inference in generalized linear mixed models, J. Am. Stat. Assoc., 88, 9-25 (1993) · Zbl 0775.62195
[9] Breslow, NE, Extra-Poisson variation in log-linear models, J. Royal Stat. Soc. Series C (Applied Statistics), 33, 38-44 (1984)
[10] Celeux, G.; Forbes, F.; Robert, C.; Titterington, D., Deviance information criteria for missing data models, Bayesian Anal., 1, 651-673 (2006) · Zbl 1331.62329
[11] Chan, J. and Grant, A. (2014). Fast computation of the deviance information criterion for latent variable models. Comput. Stat. Data Anal. doi:10.1016/j.csda.2014.07.018. · Zbl 1466.62039
[12] Chan, J.; Grant, A., On the observed-data deviance information criterion for volatility modeling, J. Financial Economet, 14, 772-802 (2016)
[13] Chen, Q.; Nian, H.; Zhu, Y.; Talbot, HK; Griffin, MR; Harrell, Jr,FE, Too many covariates and too few cases?-a comparative study, Stat. Med., 35, 4546-4558 (2016)
[14] Christensen, F. G. W. (2017). New Approaches to Model Selection in Bayesian Mixed Modeling. PhD thesis, UC Irvine.
[15] De Oliveira, MC; Castro, LM; Dey, DK; Sinha, D., Bregman divergence to generalize Bayesian influence measures for data analysis, J. Stat. Plan. Inference, 213, 222-232 (2021) · Zbl 1465.62167
[16] Faught, E.; Wilder, B.; Ramsay, R.; Reife, R.; Kramer, L.; Pledger, G.; Karim, R., Topiramate placebo-controlled dose-ranging trial in refractory partial epilepsy using 200-, 400-, and 600-mg daily dosages, Neurology, 46, 1684-1690 (1996)
[17] Fitzmaurice, GM, Model selection with overdispersed data, J. Royal Stat. Soc. Series D (The Statistician), 46, 81-91 (1997)
[18] Geisser, S.; Eddy, WF, A predictive approach to model selection, J. Am. Stat. Assoc., 74, 153-160 (1979) · Zbl 0401.62036
[19] Gelfand, A.; Dey, D., Bayesian model choice: Asymptotics and exact calculations, J. Royal Stat. Soc. - Series B, 56, 501-514 (1994) · Zbl 0800.62170
[20] Gelman, A., Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper), Bayesian Anal., 1, 515-534 (2006) · Zbl 1331.62139
[21] Gelman, A.; Rubin, DB, Inference from iterative simulation using multiple sequences, Stat. Sci., 7, 457-472 (1992) · Zbl 1386.65060
[22] Hinde, J.; Demétrio, CG, Overdispersion: models and estimation, Comput. Stat. Data Anal., 27, 151-170 (1998) · Zbl 1042.62578
[23] Howe, EJ; Buckland, ST; Després-Einspenner, M-L; Kühl, HS, Model selection with overdispersed distance sampling data, Methods Ecol. Evol., 10, 38-47 (2019)
[24] Kass, RE; Raftery, AE, Bayes factors, J. Amer. Stat. Assoc., 90, 773-795 (1995) · Zbl 0846.62028
[25] Koehler, E.; Brown, E.; Haneuse, SJ-P, On the assessment of Monte Carlo error in simulation-based statistical analyses, Am. Stat., 63, 155-162 (2009) · Zbl 1404.62150
[26] Lambert, D., Zero-inflated Poisson regression, with an application to defects in manufacturing, Technometrics, 34, 1-14 (1992) · Zbl 0850.62756
[27] Lawless, JF, Negative binomial and mixed Poisson regression, Canadian J. Stat., 15, 209-225 (1987) · Zbl 0632.62060
[28] Li, Y., Zeng, T. and Yu, J. (2012). Robust deviance information criterion for latent variable models. Res. Collect. School Econ. Available at http://ink.library.smu.edu.sg/soe_research/1403.
[29] Mason, A.; Richardson, S.; Best, N., Two-pronged strategy for using DIC to compare selection models with non-ignorable missing responses, Bayesian Anal., 7, 109-146 (2012) · Zbl 1330.62092
[30] McCullagh, P., Generalized Linear Models (1989), Routledge · Zbl 0744.62098
[31] Merkle, E., Furr, D. and Rabe-Hesketh, S. (2018). Bayesian model assessment:, Use of conditional vs marginal likelihoods. arXiv:1802.04452.
[32] Millar, R., Comparison of hierarchical Bayesian models for overdispersed count data using DIC and Bayes’ factors, Biometrics, 65, 962-969 (2009) · Zbl 1172.62054
[33] Millar, RB, Conditional vs marginal estimation of the predictive loss of hierarchical models using WAIC and cross-validation, Stat. Comput., 28, 375-385 (2018) · Zbl 1384.62093
[34] Molenberghs, G.; Verbeke, G., Models for Discrete Longitudinal Data (2005), New York: Springer-Verlag, New York · Zbl 1093.62002
[35] Molenberghs, G.; Verbeke, G.; Demétrio, CG, An extended random-effects approach to modeling repeated, overdispersed count data, Lifetime Data Anal., 13, 513-531 (2007) · Zbl 1331.62363
[36] Molenberghs, G.; Verbeke, G.; Demétrio, CG; Vieira, AM, A family of generalized linear models for repeated measures with normal and conjugate random effects, Stat. Sci., 25, 325-347 (2010) · Zbl 1329.62342
[37] Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling.
[38] Quintero, A.; Lesaffre, E., Comparing hierarchical models via the marginalized deviance information criterion, Stat. Med., 37, 2440-2454 (2018)
[39] Rakhmawati, TW; Molenberghs, G.; Verbeke, G.; Faes, C., Local influence diagnostics for hierarchical count data models with overdispersion and excess zeros, Biom. J., 58, 1390-1408 (2016) · Zbl 1353.62125
[40] Rue, H.; Martino, S.; Chopin, N., Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, J. Royal Stat. Soc. - Series B, 71, 319-392 (2009) · Zbl 1248.62156
[41] Spiegelhalter, D.; Best, N.; Carlin, N.; van der Linde, A., Bayesian measures of model complexity and fit, J. Royal Stat. Soc. - Series B, 64, 583-639 (2002) · Zbl 1067.62010
[42] Spiegelhalter, D.; Best, N.; Carlin, N.; van der Linde, A., The deviance information criterion: 12 years on, J. Royal Stat. Soc. - Series B, 76, 485-493 (2014) · Zbl 1411.62027
[43] Tokdar, ST; Kass, RE, Importance sampling: a review, Wiley Interdiscipl. Rev. Comput. Stat., 2, 54-60 (2010)
[44] Tran, M. -N., Scharth, M., Pitt, M. K. and Kohn, R. (2016). Importance sampling squared for Bayesian inference in latent variable models. arXiv:1309.3339.
[45] Vaida, F.; Blanchard, S., Conditional Akaike information for mixed-effects models, Biometrika, 92, 351-370 (2005) · Zbl 1094.62077
[46] van Smeden, M.; de Groot, JA; Moons, KG; Collins, GS; Altman, DG; Eijkemans, MJ; Reitsma, JB, No rationale for 1 variable per 10 events criterion for binary logistic regression analysis, BMC Med. Res. Methodol., 16, 163 (2016)
[47] van Smeden, M.; Moons, KG; de Groot, JA; Collins, GS; Altman, DG; Eijkemans, MJ; Reitsma, JB, Sample size for binary logistic prediction models: Beyond events per variable criteria, Stat. Methods Med. Res., 28, 2455-2474 (2019)
[48] Verbeke, G.; Molenberghs, G., Linear Mixed Models for Longitudinal Data (2000), New York: Springer Series in Statistics, New York · Zbl 0956.62055
[49] Warton, DI, Many zeros does not mean zero inflation:, comparing the goodness-of-fit of parametric models to multivariate abundance data, Environ. Official J. Int. Environ. Soc., 16, 275-289 (2005)
[50] Watanabe, S., Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory, J. Mach. Learn. Res., 11, 3571-3594 (2010) · Zbl 1242.62024
[51] Watanabe, S., A widely applicable Bayesian information criterion, J. Mach. Learn. Res., 14, 867-897 (2013) · Zbl 1320.62058
[52] Yau, KK; Wang, K.; Lee, AH, Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros, Biomet. J. J. Math. Methods Biosci., 45, 437-452 (2003) · Zbl 1441.62543
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.