×

Finite mixtures of mean-parameterized Conway-Maxwell-Poisson models. (English) Zbl 07887490

Summary: For modeling count data, the Conway-Maxwell-Poisson (CMP) distribution is a popular generalization of the Poisson distribution due to its ability to characterize data over- or under-dispersion. While the classic parameterization of the CMP has been well-studied, its main drawback is that it is does not directly model the mean of the counts. This is mitigated by using a mean-parameterized version of the CMP distribution. In this work, we are concerned with the setting where count data may be comprised of subpopulations, each possibly having varying degrees of data dispersion. Thus, we propose a finite mixture of mean-parameterized CMP distributions. An EM algorithm is constructed to perform maximum likelihood estimation of the model, while bootstrapping is employed to obtain estimated standard errors. A simulation study is used to demonstrate the flexibility of the proposed mixture model relative to mixtures of Poissons and mixtures of negative binomials. An analysis of dog mortality data is presented.

MSC:

62-XX Statistics

Software:

MASS (R); flexmix
Full Text: DOI

References:

[1] Abdel-Aty, MA; Essam Radwan, A., Modeling traffic accident occurrence and involvement, Accid Anal Prev, 32, 5, 633-642, 2000
[2] Akaike, H.; Petrov, BN; Csaki, F., Information theory and an extension of the maximum likelihood principle, Second international symposium on information theory, 267-281, 1973, Budapest: Akademiai Kiado, Budapest · Zbl 0283.62006
[3] Arora, M.; Chaganty, NR; Sellers, KF, A flexible regression model for zero- and \(k\)-inflated count data, J Stat Comput Simul, 91, 9, 1815-1845, 2021 · Zbl 07493370
[4] Celeux, G.; Soromenho, G., An entropy criterion for assessing the number of clusters in a mixture model, J Classif, 13, 2, 195-212, 1996 · Zbl 0861.62051
[5] Conway, RW; Maxwell, WL, A queuing model with state dependent service rates, J Ind Eng, 12, 132-136, 1962
[6] Cunningham, RB; Lindenmayer, DB, Modeling count data of rare species: some statistical issues, Ecology, 86, 5, 1135-1142, 2005
[7] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc Ser B Stat Methodol, 39, 1, 1-38, 1977 · Zbl 0364.62022
[8] Dénes, FV; Fábio Silveira, L.; Beissinger, SR, Estimating abundance of unmarked animal populations: accounting for imperfect detection and other sources of zero inflation, Methods Ecol Evol, 6, 5, 543-556, 2015
[9] Feng, W.; Liu, Y.; Wu, J.; Nephew, KP; Huang, THM; Li, L., A Poisson mixture model to identify changes in RNA polymerase II binding quantity using high-throughput sequencing technology, BMC Genomics, 9, Suppl 2, S23, 2008
[10] Fraley, C.; Raftery, AE, How many clusters? Which clustering method to use? Answers via model-based cluster analysis, Comput J, 41, 8, 578-588, 1998 · Zbl 0920.68038
[11] Guikema, SD; Coffelt, JP, A flexible count data regression model for risk analysis, Risk Anal, 28, 1, 213-223, 2008
[12] Hilbe, JM, Negative binomial regression, 2011, Cambridge: Cambridge University Press, Cambridge · Zbl 1269.62063
[13] Huang, A., Mean-parametrized Conway-Maxwell-Poisson regression models for dispersed counts, Stat Model, 17, 6, 359-380, 2017 · Zbl 07289488
[14] Huang, A.; Rathouz, PJ, Orthogonality of the mean and error distribution in generalized linear models, Commun Stat Simul Comput, 46, 7, 3290-3296, 2016 · Zbl 1368.62215
[15] Huang, C.; Liu, X.; Yao, T.; Wang, X., An efficient EM algorithm for the mixture of negative binomial models, J Phys Conf Ser, 1324, 1, 012093, 2019
[16] Ismail, N.; Ali, KMM; Chiew, AC, A model for insurance claim count with single and finite mixture distribution, Sains Malays, 33, 173-194, 2004
[17] Konşuk Ünlü, H.; Young, DS; Yiğiter, A.; Özcebe, LH, A mixture model with Poisson and zero-truncated Poisson components to analyze road traffic accidents in Turkey, J Appl Stat, 49, 4, 1003-1017, 2022 · Zbl 07549101
[18] Leisch, F., FlexMix: a general framework for finite mixture models and latent class regression in R, J Stat Softw, 11, 8, 1-18, 2004
[19] Leroux, BG, Consistent estimation of a mixing distribution, Ann Stat, 20, 3, 1350-1360, 1992 · Zbl 0763.62015
[20] Lewis, TW; Wiles, BM; Llewellyn-Zaidi, AM; Evans, KM; O’Neill, DG, Longevity and mortality in Kennel Club registered dog breeds in the UK in 2014, Canine Genet Epidemiol, 5, 1, 10, 2018
[21] Li, X.; Dey, DK, Estimation of COVID-19 mortality in the United States using spatio-temporal Conway Maxwell Poisson model, Spat Stat, 49, 100542, 2022
[22] Li, J.; Zha, H., Two-way Poisson mixture models for simultaneous document classification and word clustering, Comput Stat Data Anal, 50, 1, 163-180, 2006 · Zbl 1429.62253
[23] Li, Q.; Noel-MacDonnell, JR; Koestler, DC; Goode, EL; Fridley, BL, Subject level clustering using a negative binomial model for small transcriptomic studies, BMC Bioinform, 19, 1, 474, 2018
[24] Lord, D.; Guikema, SD; Geedipally, SR, Application of the Conway-Maxwell-Poisson generalized linear model for analyzing motor vehicle crashes, Accid Anal Prev, 40, 3, 1123-1134, 2008
[25] McLachlan, GJ; Peel, D., Finite mixture models. Wiley series in probability and statistics, 2000, New York: Wiley, New York · Zbl 0963.62061
[26] Muenz, DG; Braun, TM; Taylor, JMG, Modeling adverse event counts in phase I clinical trials of a cytotoxic agent, Clin Trials, 15, 4, 386-397, 2018
[27] Park, BJ; Lord, D., Application of finite mixture models for vehicle crash data analysis, Accid Anal Prev, 41, 4, 683-691, 2009
[28] Piancastelli, LSC; Friel, N.; Barretto-Souza, W.; Ombao, H., Multivariate Conway-Maxwell-Poisson distribution: Sarmanov method and doubly intractable Bayesian inference, J Comput Graph Stat, 2022 · Zbl 07747452 · doi:10.1080/10618600.2022.2116443
[29] Redner, RA; Walker, HF, Mixture densities, maximum likelihood and the EM algorithm, SIAM Rev, 26, 2, 195-239, 1984 · Zbl 0536.62021
[30] Ribeiro, EE; Zeviani, WM; Bonat, WH; Demetrio, CG; Hinde, J., Reparametrization of COM-Poisson regression models with applications in the analysis of experimental data, Stat Model, 20, 5, 443-466, 2020 · Zbl 1482.62043
[31] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 2, 461-464, 1978 · Zbl 0379.62005
[32] Sellers, KF, The Conway-Maxwell-Poisson distribution, 2023, Cambridge: Cambridge University Press, Cambridge · Zbl 1514.62003
[33] Sellers, KF; Raim, A., A flexible zero-inflated model to address data dispersion, Comput Stat Data Anal, 99, 68-80, 2016 · Zbl 1468.62176
[34] Sellers, KF; Shmueli, G., A flexible regression model for count data, Ann Appl Stat, 4, 2, 943-961, 2010 · Zbl 1194.62091
[35] Sellers, KF; Shmueli, G., Data dispersion: now you see it... Now you don’t, Commun Stat Theory Methods, 42, 17, 3134-3147, 2013 · Zbl 1277.62170
[36] Shmueli, G.; Minka, TP; Kadane, JB; Borle, S.; Boatwright, P., A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution, J R Stat Soc Ser C Appl Stat, 54, 1, 127-142, 2005 · Zbl 1490.62058
[37] Smyth, GK; Jørgensen, B., Fitting Tweedie’s compound Poisson model to insurance claims data: dispersion modelling, ASTIN Bull, 32, 1, 143-157, 2002 · Zbl 1094.91514
[38] Sur, P.; Shmueli, G.; Bose, S.; Dubey, P., Modeling bimodal discrete data using Conway-Maxwell-Poisson mixture models, J Bus Econ Stat, 33, 3, 352-365, 2015
[39] Venables, WN; Ripley, BD, Modern applied statistics with S, 2002, New York: Springer, New York · Zbl 1006.62003
[40] Wu, CFJ, On the convergence properties of the EM algorithm, Ann Stat, 11, 1, 95-103, 1983 · Zbl 0517.62035
[41] Yip, KCH; Yau, KKW, On modeling claim frequency data in general insurance with extra zeros, Insur Math Econ, 36, 2, 153-163, 2005 · Zbl 1070.62098
[42] Zhang, P.; Wu, HY; Chiang, CW; Wang, L.; Binkheder, S.; Wang, X.; Zeng, D.; Quinney, SK; Li, L., Translational biomedical informatics and pharmacometrics approaches in the drug interactions research, CPT Pharmacomet Syst Pharmacol, 7, 2, 90-102, 2018
[43] Zou, Y.; Zhang, Y.; Lord, D., Application of finite mixture of negative binomial regression models with varying weight parameters for vehicle crash data analysis, Accid Anal Prev, 50, 1042-1051, 2013
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.