×

Comparison of hierarchical Bayesian models for overdispersed count data using DIC and Bayes’ factors. (English) Zbl 1172.62054

Summary: When replicate count data are overdispersed, it is common practice to incorporate this extra-Poisson variability by including latent parameters at the observation level. For example, the negative binomial and Poisson-lognormal (PLN) models are obtained by using gamma and lognormal latent parameters, respectively. Several recent publications have employed the deviance information criterion (DIC) to choose between these two models, with the deviance defined using the Poisson likelihood that is obtained from conditioning on these latent parameters. The results herein show that this use of DIC is inappropriate. Instead, DIC was seen to perform well if calculated using the likelihood that was marginalized at the group level by integrating out the observation-level latent parameters.
This group-level marginalization is explicit in the case of the negative binomial, but requires numerical integration for the PLN model. Similarly, DIC performed well to judge whether zero inflation was required when calculated using the group-marginalized form of the zero-inflated likelihood. In the context of comparing multilevel hierarchical models, the top-level DIC was obtained using likelihood that was further marginalized by additional integration over the group-level latent parameters, and the marginal densities of the models were calculated for the purpose of providing Bayes’ factors. The computational viability and interpretability of these different measures is considered.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
Full Text: DOI

References:

[1] Albert, Criticism of a hierarchical model using Bayes factors, Statistics in Medicine 18 pp 287– (1999) · doi:10.1002/(SICI)1097-0258(19990215)18:3<287::AID-SIM18>3.0.CO;2-3
[2] Anderson, Spatial variation and effects of habitat on temperate reef fish assemblages in northeastern New Zealand, Journal of Experimental Marine Biology and Ecology 305 pp 191– (2004) · doi:10.1016/j.jembe.2003.12.011
[3] Bailey, Simultaneous modelling of multiple traffic safety performance indicators by using a multivariate generalized linear mixed model, Journal of the Royal Statistical Society, Series A 167 pp 501– (2004) · doi:10.1111/j.1467-985X.2004.0apm7.x
[4] Box, Sampling and Bayes’ inference in scientific modelling and robustness, Journal of the Royal Statistical Society, Series A 143 pp 383– (1980) · Zbl 0471.62036 · doi:10.2307/2982063
[5] Carlin, Comment on article by Celeux et al, Bayesian Analysis 1 pp 675– (2006) · Zbl 1331.62328 · doi:10.1214/06-BA122A
[6] Celeux, Deviance information criteria for missing data models (with discussion), Bayesian Analysis 1 pp 651– (2006) · Zbl 1331.62329 · doi:10.1214/06-BA122
[7] Chib, Marginal likelihood from the Gibbs output, Journal of the American Statistical Association 90 pp 1313– (1995) · Zbl 0868.62027 · doi:10.2307/2291521
[8] Dagne, Hierarchical Bayesian analysis of correlated zero-inflated count data, Biometrical Journal 46 pp 653– (2004) · doi:10.1002/bimj.200310077
[9] Echavarría , L. E. O. 2004 Semiparametric Bayesian count data models Ph.D. Dissertation · Zbl 1136.62025
[10] Fahrmeir, Structured additive regression for overdispersed and zero-inflated count data, Applied Stochastic Models in Business and Industry 22 pp 351– (2006) · Zbl 1114.62023 · doi:10.1002/asmb.631
[11] Francis, Coastal fishes of New Zealand (1996)
[12] Gelman, Prior distributions for variance parameters in hierarchical models (Comment on article by Browne and Draper), Bayesian Analysis 1 pp 515– (2006) · Zbl 1331.62139 · doi:10.1214/06-BA117A
[13] Gelman, Simulating normalizing constants: From importance sampling to bridge sampling to path sampling, Statistical Science 13 pp 163– (1998) · Zbl 0966.65004 · doi:10.1214/ss/1028905934
[14] Gelman, Bayesian Data Analysis (2003) · Zbl 1114.62320
[15] Genest, Frank’s family of bivariate distributions, Biometrika 74 pp 549– (1987) · Zbl 0635.62038 · doi:10.1093/biomet/74.3.549
[16] George, Conjugate likelihood distributions, Scandinavian Journal of Statistics 20 pp 147– (1993) · Zbl 0776.62027
[17] Golicher, Lifting a veil on diversity: A Bayesian approach to fitting relative-abundance models, Ecological Applications 16 pp 201– (2006) · doi:10.1890/04-1599
[18] Han, Markov chain Monte Carlo methods for computing Bayes factors: A comparative review, Journal of the American Statistical Association 96 pp 1122– (2001) · doi:10.1198/016214501753208780
[19] Kass, Bayes factors, Journal of the American Statistical Association 90 pp 773– (1995) · Zbl 0846.62028 · doi:10.2307/2291091
[20] Kocherlakota, Bivariate Discrete Distributions (1992)
[21] Kuhnert, Assessing the impacts of grazing levels on bird density in woodland habitat: A Bayesian approach using expert opinion, Environmetrics 16 pp 717– (2005) · doi:10.1002/env.732
[22] Lawson, Comment on article by Spiegelhalter et al, Journal of the Royal Statistical Society, Series B 64 pp 624– (2002)
[23] Lee, Modelling rugby league data via bivariate negative binomial regression, Australian and New Zealand Journal of Statistics 41 pp 141– (1999) · Zbl 0946.62101 · doi:10.1111/1467-842X.00070
[24] Martin, Zero tolerance ecology: Improving ecological inference by modelling the source of zero observations, Ecology Letters 8 pp 1235– (2005) · doi:10.1111/j.1461-0248.2005.00826.x
[25] Meng, What’s missing for DIC with missing data? (Comment on article by Celeux et al.), Bayesian Analysis 1 pp 687– (2006) · Zbl 1331.62338 · doi:10.1214/06-BA122D
[26] Miranda-Moreno, Bayesian multiple testing procedures for hotspot identification, Accident Analysis and Prevention 39 pp 1192– (2007) · doi:10.1016/j.aap.2007.03.008
[27] Plummer, Comment on article by Celeux et al, Bayesian Analysis 1 pp 681– (2006) · Zbl 1331.62340 · doi:10.1214/06-BA122C
[28] Sinharay, Posterior predictive model checking in hierarchical models, Journal of Statistical Planning and Inference 111 pp 209– (2003) · Zbl 1033.62027 · doi:10.1016/S0378-3758(02)00303-8
[29] Sinharay, An empirical comparison of methods for computing Bayes factors in generalized linear mixed models, Journal of Computational and Graphical Statistics 14 pp 415– (2005) · doi:10.1198/106186005X47471
[30] Spiegelhalter, Bayesian measures of model complexity and fit (with discussion), Journal of the Royal Statistical Society, Series B 64 pp 583– (2002) · Zbl 1067.62010 · doi:10.1111/1467-9868.00353
[31] Van Den Heede, Adverse outcomes in Belgian acute hospitals: Retrospective analysis of the national hospital discharge dataset, International Journal for Quality in Health Care 18 pp 211– (2006) · doi:10.1093/intqhc/mzl003
[32] Warton, Many zeros does not mean zero inflation: Comparing the goodness-of-fit of parametric models to multivariate abundance data, Environmetrics 16 pp 275– (2005) · doi:10.1002/env.702
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.