×

A weighted partial likelihood approach for zero-truncated models. (English) Zbl 1429.62548

Summary: Zero-truncated data arises in various disciplines where counts are observed but the zero count category cannot be observed during sampling. Maximum likelihood estimation can be used to model these data; however, due to its nonstandard form it cannot be easily implemented using well-known software packages, and additional programming is often required. Motivated by the Rao-Blackwell theorem, we develop a weighted partial likelihood approach to estimate model parameters for zero-truncated binomial and Poisson data. The resulting estimating function is equivalent to a weighted score function for standard count data models, and allows for applying readily available software. We evaluate the efficiency for this new approach and show that it performs almost as well as maximum likelihood estimation. The weighted partial likelihood approach is then extended to regression modelling and variable selection. We examine the performance of the proposed methods through simulation and present two case studies using real data.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

R; Rcapture; sandwich
Full Text: DOI

References:

[1] Baillargeon, S., & Rivest, L. P. (2007). Rcapture: Loglinear models for capture-recapture in R. Journal of Statistical Software, 19. Retrieved from http://www.jstatsoft.org/v19/i05.
[2] Bhapkar, V. P. (1972). On a measure of efficiency of an estimating equation. Sankhya A, 34, 467-472. · Zbl 0267.62012
[3] Böhning, D. (2015). Power series mixtures and the ratio plot with applications to zero‐truncated count distribution modelling. Metron, 73, 201-216. · Zbl 1329.62041
[4] Böhning, D., Dietz, E., Kuhnert, R., & Schon, D. (2005). Mixture models for capture-recapture count data. Statistical Methods & Applications, 14, 29-43. · Zbl 1089.62126
[5] Böhning, D., & van der Heijden, P. G. M. (2009). A covariate adjustment for zero‐truncated approaches to estimating the size of hidden and elusive populations. The Annals of Applied Statistics, 3, 595-610. · Zbl 1166.62006
[6] Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data. Cambridge: Cambridge University Press. · Zbl 1301.62003
[7] Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models: A modern perspective (2nd ed.). London: Chapman and Hall. · Zbl 1119.62063
[8] Cox, D. R. (1975). Partial likelihood. Biometrika, 62, 269-276. · Zbl 0312.62002
[9] Czado, C., Gneiting, T., & Held, L. (2009). Predictive model assessment for count data. Biometrics, 65, 1254-1261. · Zbl 1180.62162
[10] Dawid, A. P., & Sebastiani, P. (1999). Coherent dispersion criteria for optimal experimental design. The Annals of Statistics, 27, 65-81. · Zbl 0948.62057
[11] Deb, P., & Trivedi, P. K. (1997). Demand for medical care by the elderly: A finite mixture approach. Journal of Applied Econometrics, 12, 313-336.
[12] Dietz, E., & Böhning, D. (2000). On estimation of the Poisson parameter in zero‐modified Poisson models. Computational Statistics & Data Analysis, 34, 441-459. · Zbl 1046.62085
[13] Ding, X. (2015). Some new statistical methods for a class of zero‐truncated discrete distributions with applications. HKU Theses Online (HKUTO).
[14] Gelman, A., King, G., & Liu, C. (1998). Not asked and not answered: Multiple imputation for multiple surveys. Journal of the American Statistical Association, 93, 846-857.
[15] Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The Lasso and generalizations. London: Chapman & Hall/CRC. · Zbl 1319.68003
[16] Horvitz, D., & Thompson, D. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 260, 663-685. · Zbl 0047.38301
[17] Hosseinpour, M., Yahaya, A. S., & Sadullah, A. F. (2014). Exploring the effects of roadway characteristics on the frequency and severity of head‐on crashes: Case studies from Malaysian federal roads. Accident Analysis & Prevention, 62, 209-222.
[18] Huggins, R. M. (1989). On the statistical analysis of capture experiments. Biometrika, 76, 113-140. · Zbl 0664.62115
[19] Hwang, W. H., & Huggins, R. (2005). An examination of the effect of heterogeneity on the estimation of population size using capture-recapture data. Biometrika, 92, 229-233. · Zbl 1068.62111
[20] Liu, X., & Zeng, D. (2013). Variable selection in semiparametric transformation models for right‐censored data. Biometrika, 100, 859-876. · Zbl 1279.62077
[21] McCrea, R. S., & Morgan, B. J. T. (2014). Analysis of capture-recapture data. London: Chapman & Hall/CRC. · Zbl 1331.92001
[22] Nakamura, T. (1990). Corrected score function for errors‐in‐variables models: Methodology and application to generalized linear models. Biometrika, 77, 127-137. · Zbl 0691.62066
[23] R Development Core Team (2019). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.r‐project.org.
[24] Rivest, L. P. (2011). A lower bound model for multiple record systems estimation with heterogeneous catchability. The International Journal of Biostatistics, 7, Article 23.
[25] Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846-866. · Zbl 0815.62043
[26] Ross, S. M. (2007). Introduction to probability models (9th ed.). San Diego, CA: Academic press.
[27] Stefanski, L. A. (1989). Unbiased estimation of a nonlinear function a normal mean with application to measurement error models. Communications in Statistics, Series A : Theory and Methods, 18, 4335-4358. · Zbl 0707.62058
[28] Stoklosa, J., & Huggins, R. M. (2012). A robust P‐spline approach to closed population capture-recapture models with time dependence and heterogeneity. Computational Statistics & Data Analysis, 56, 408-417. · Zbl 1239.62011
[29] Stoklosa, J., Hwang, W. H., Wu, S. H., & Huggins, R. M. (2011). Heterogeneous capture-recapture models with covariates: A partial likelihood approach for closed populations. Biometrics, 67, 1659-1665. · Zbl 1274.62879
[30] Van der Heijden, P. G. M., Cruyff, M. J. L. F., & Van Houwelingen, H. C. (2003). Estimating the size of a criminal population from police records using the truncated Poisson regression model. Statistica Neerlandica, 57, 1-16.
[31] Wong, W. H. (1986). Theory of partial likelihood. The Annals of Statistics, 14, 88-123. · Zbl 0603.62032
[32] Wood, S. N. (2006). Generalized additive models: An introduction with R. Boca Raton, FL: Chapman & Hall/CRC. · Zbl 1087.62082
[33] Wu, M., Zheng, M., Yu, W., & Wu, R. (2018). Estimation and variable selection for semiparametric transformation models under a more efficient cohort sampling design. Test, 27, 570-596. · Zbl 1417.62334
[34] Xie, F., & Paik, M. C. (1997). Generalized estimating equation model for binary outcomes with missing covariates. Biometrics, 53, 1458-1466. · Zbl 0932.62121
[35] Yee, T. W. (2015). Vector generalized linear and additive models. New York: Springer‐Verlag. · Zbl 1380.62006
[36] Yip, P. S. F. (1988). Inference about the mean of Poisson distribution in the presence of a nuisance parameter. Australian Journal of Statistics, 30, 299-306. · Zbl 0707.62053
[37] Zeileis, A. (2006). Object‐oriented computation of sandwich estimators. Journal of Statistical Software, 16, 1-16.
[38] Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27, 1-25.
[39] Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320. · Zbl 1069.62054
[40] Zwane, E. N., & van der Heijden, P. G. M. (2004). Semiparametric models for capture-recapture studies with covariates. Computational Statistics & Data Analysis, 47, 729-743. · Zbl 1429.62642
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.