×

Revisiting the analysis pipeline for overdispersed Poisson and binomial data. (English) Zbl 07716687

Summary: Overdispersion is a common feature in categorical data analysis and several methods have been developed for detecting and handling it in generalized linear models. The first aim of this study is to clarify the relationships among various score statistics for testing overdispersion and to compare their performances. In addition, we investigate a principled way to correct finite sample bias in the score statistic caused by estimating regression parameters with restricted likelihood. The second aim is to reconsider the current practice for handling overdispersed categorical data. Although the conventional models are based on substantially different mechanisms for generating overdispersion, model selection in practice has not been well studied. We perform an intensive numerical study for determining which method is more robust to various overdispersion mechanisms. In addition, we provide some graphical tools for identifying the better model. The last aim is to reconsider the key assumption for deriving the score statistics. We study the meaning of testing overdispersion when this assumption is violated, and we analytically show the conditions for which it is not appropriate to employ the current statistical practices for analyzing overdispersed data.

MSC:

62-XX Statistics

References:

[1] Agresti, A., An Introduction to Categorical Data Analysis (2007), John Wiley & Sons: John Wiley & Sons, Hoboken, NJ · Zbl 1266.62008
[2] Azzalini, A.; Bowman, A. W.; Härdle, W., On the use of nonparametric regression for model checking, Biometrika, 76, 1-11 (1989) · Zbl 0663.62096
[3] Breslow, N. E., Extra-Poisson variation in log-linear models, Appl. Stat., 33, 38-44 (1984)
[4] Breslow, N. E., Tests of hypotheses in overdispersed Poisson regression and other quasi-likelihood models, J. Am. Stat. Assoc., 85, 565-571 (1990)
[5] Breslow, N. E.; Clayton, D. G., Approximate inference in generalized linear mixed models, J. Am. Stat. Assoc., 88, 9-25 (1993) · Zbl 0775.62195
[6] Cameron, A. C.; Trivedi, P. K., Econometric models based on count data: Comparisons and applications of some estimators and tests, J. Appl. Econom., 1, 29-53 (1986)
[7] Cameron, A. C.; Trivedi, P. K., Regression Analysis of Count Data (2013), Cambridge University Press: Cambridge University Press, New York, NY · Zbl 1301.62003
[8] Commenges, D.; Letenneur, L.; Jacqmin, H.; Moreau, T.; Dartigues, J. F., Test of homogeneity of binary data with explanatory variables, Biometrics, 50, 613-620 (1994) · Zbl 0825.62782
[9] Cox, D. R., Some remarks on overdispersion, Biometrika, 70, 269-274 (1983) · Zbl 0511.62007
[10] Dean, C. B.; Lawless, J. F., Tests for detecting overdispersion in Poisson regression models, J. Am. Stat. Assoc., 84, 467-472 (1989)
[11] Dean, C. B., Testing for overdispersion in Poisson and binomial regression models, J. Am. Stat. Assoc., 87, 451-457 (1992)
[12] Hall, D. B.; Praestagarrd, J. T., Order-restricted score tests for homogeneity in generalised linear and nonlinear mixed models, Biometrika, 88, 739-751 (2001) · Zbl 1009.62057
[13] Harville, D. A., Maximum likelihood approaches to variance component estimation and related problems, J. Am. Stat. Assoc., 72, 320-338 (1977) · Zbl 0373.62040
[14] Hinde, J.; Demétrio, C. G.B., Overdispersion: Models and estimation, Comput. Stat. Data Anal., 27, 151-170 (1998) · Zbl 1042.62578
[15] Hinde, J. and Demétrio, C.G.B., Overdispersion: Models and Estimation, A short course for SINAPE, 1998.
[16] Lawless, J. F., Negative binomial and mixed Poisson regression, Can. J. Stat., 15, 209-225 (1987) · Zbl 0632.62060
[17] Le Cessie, S.; van Houwelingen, H. C., Testing the fit of a regression model via score tests in random effect models, Biometrics, 51, 600-614 (1995) · Zbl 0825.62608
[18] Lesnoff, M. and Lancelot, R., aod: Analysis of Overdispersed Data, R package version 1.3, 2012. Available at http://cran.r-project.org/package=aod.
[19] Liang, K., A locally most powerful test for homogeneity with many strata, Biometrika, 74, 259-264 (1987) · Zbl 0621.62025
[20] Liang, K.; McCullagh, P., Case studies in binary dispersion, Biometrics, 49, 623-630 (1993)
[21] Lin, X., Variance component testing in generalised linear models with random effects, Biometrika, 84, 309-326 (1997) · Zbl 0881.62074
[22] McCullagh, P., The conditional distribution of goodness-of-fit statistics for discrete data, J. Am. Stat. Assoc., 81, 104-107 (1986)
[23] O’hara Hines, R. J., A comparison of tests for overdispersion in generalized linear models, J. Stat. Comput. Simul., 58, 323-342 (1997) · Zbl 1065.62535
[24] Oztig, L. I.; Askin, O. E., Human mobility and coronavirus disease 2019 (COVID-19): A negative binomial regression analysis, Public Health, 185, 364-367 (2020)
[25] Patterson, H. D.; Thompson, R., Recovery of inter-block information when block sizes are unequal, Biometrika, 58, 545-554 (1974) · Zbl 0228.62046
[26] Silvapulle, M. J.; Silvapulle, P., A score test against one-sided alternatives, J. Am. Stat. Assoc., 90, 342-349 (1995) · Zbl 0818.62022
[27] Sinha, S. K., Bootstrap tests for variance components in generalized linear mixed models, Can. J. Stat., 37, 219-234 (2009) · Zbl 1176.62013
[28] Skrondal, A.; Rabe-Hesketh, S., Redundant overdispersion parameters in multilevel models for categorical responses, J. Educ. Behav. Stat., 32, 419-430 (2007)
[29] Swihart, B. and Lindsey, J.K., rmutil: Utilities for Nonlinear Regression and Repeated Measurements Models, R package version 1.1.0, 2017. Available at https://CRAN.R-project.org/package=rmutil.
[30] Venables, W. N.; Ripley, B. D., Modern Applied Statistics with S (2002), Springer: Springer, New York · Zbl 1006.62003
[31] Verbeke, G.; Molenberghs, G., The use of score tests for inference on variance components, Biometrics, 59, 254-262 (2003) · Zbl 1210.62013
[32] Ver Hoef, J. M.; Boveng, P. L., Quasi-Poisson vs. negative binomial regression: How should we model overdispersed count data?, Ecology, 88, 2766-2772 (2007)
[33] Wang, Z.; Louis, T., Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function, Biometrika, 90, 765-775 (2003) · Zbl 1436.62294
[34] Wood, S., Generalized Additive Models: An Introduction with R (2017), Chapman & Hall/CRC: Chapman & Hall/CRC, New York · Zbl 1368.62004
[35] Yang, Z.; Hardin, J. W.; Addy, C. L.; Vuong, Q. H., Testing approaches for overdispersion in poisson regression versus the generalized Poisson model, Biom. J., 49, 565-584 (2007) · Zbl 1442.62709
[36] Young-Xu, Y.; Chan, K. A., Pooling overdispersed binomial data to estimate event rate, BMC Med. Res. Methodol., 8, 58 (2008)
[37] Zhang, D. and Lin, X., Variance component testing in generalized linear mixed models for longitudinal/clustered data and other related topics, in Random Effect and Latent Variable Model Selection, D.B. Dunson, Ed., Lecture Notes in Statistics, Vol. 192, Springer, New York, NY, 2008. · Zbl 1145.62003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.