×

Comparing two samples by penalized logistic regression. (English) Zbl 1320.62070

Summary: Inference based on the penalized density ratio model is proposed and studied. The model under consideration is specified by assuming that the log-likelihood function of two unknown densities is of some parametric form. The model has been extended to cover multiple samples problems while its theoretical properties have been investigated using large sample theory. A main application of the density ratio model is testing whether two, or more, distributions are equal. We extend these results by arguing that the penalized maximum empirical likelihood estimator has less mean square error than that of the ordinary maximum likelihood estimator, especially for small samples. In fact, penalization resolves any existence problems of estimators and a modified Wald type test statistic can be employed for testing equality of the two distributions. A limited simulation study supports further the theory.

MSC:

62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference
62J07 Ridge regression; shrinkage estimators (Lasso)

References:

[1] Albert, A. and J. A. Anderson (1984). On the existence of maximum likelihood estimates in logistic regression models., Biometrika 71 , 1-10. · Zbl 0543.62020 · doi:10.1093/biomet/71.1.1
[2] Anderson, J. A. (1972). Separate sample logistic discrimination., Biometrika 59 , 19-35. · Zbl 0231.62080 · doi:10.1093/biomet/59.1.19
[3] Anderson, J. A. (1979). Multivariate logistic compounds., Biometrika 66 , 17-26. · Zbl 0399.62029 · doi:10.1093/biomet/66.1.17
[4] Antoniadis, A. and J. Fan (2001). Regularization of wavelet approximations., J. Amer. Statist. Assoc. 96 , 939-967. with discussion. · Zbl 1072.62561 · doi:10.1198/016214501753208942
[5] Breslow, N. E. and N. E. Day (1980)., The Analysis of Case-Control Data , Volume 1 of Statistical Methods in Cancer Research . World Helath Organization.
[6] Cox, D. R. and E. J. Snell (1989)., The Analysis of Binary Data (2nd ed.). London: Chapman & Hall. · Zbl 0729.62004
[7] Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., J. Amer. Statist. Assoc. 96 , 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[8] Farewell, V. T. (1979). Some results on the estimation of logistic models based on retrospective data., Biometrika 66 , 27-32. · Zbl 0448.62082 · doi:10.1093/biomet/66.1.27
[9] Fokianos, K. and E. Kaimi (2006). On the effect of misspecifying the density ratio model., Annals of the Institute for Statistical Mathematics 58 , 475-497. · Zbl 1099.62018 · doi:10.1007/s10463-005-0022-8
[10] Fokianos, K., B. Kedem, J. Qin, and D. A. Short (2001). A semiparametric approach to the one-way layout., Technometrics 43 , 56-64. · Zbl 1072.62583 · doi:10.1198/00401700152404327
[11] Frank, I. E. and J. H. Friedman (1993). A statistical view of some chemometric regression tools., Technometrics 35 , 109-148. · Zbl 0775.62288 · doi:10.2307/1269656
[12] Fu, W. J. (1998). Penalized regressions: the bridge versus the lasso., Journal of Computational and Graphical Statistics 7 , 397-416.
[13] Gilbert, P. B. (2000). Large sample theory of maximum likelihood estimates in semiparametric biased sampling models., The Annals of Statistics 28 , 151-194. · Zbl 1106.60302 · doi:10.1214/aos/1016120368
[14] Gilbert, P. B., S. R. Lele, and Y. Vardi (1999). Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials., Biometrika 86 , 27-43. · Zbl 0917.62061 · doi:10.1093/biomet/86.1.27
[15] Gill, R. D., Y. Vardi, and J. A. Wellner (1988). Large sample theory of empirical distributions in biased sampling models., The Annals of Statistics 16 , 1069-1112. · Zbl 0668.62024 · doi:10.1214/aos/1176350948
[16] Hastie, T. and R. Tibshirani (2004). Efficient quadratic regulariazation for expression array., Biostatistics 5 , 329-340. · Zbl 1154.62393 · doi:10.1093/biostatistics/kxh010
[17] Hoerl, A. E. and R. W. Kennard (1970a). Ridge regression:Applications to non-orthogonal problems., Technometrics 12 , 69-82. · Zbl 0202.17206 · doi:10.2307/1267352
[18] Hoerl, A. E. and R. W. Kennard (1970b). Ridge regression:Biased estimation for non-orthogonal problems., Technometrics 12 , 55-67. · Zbl 0202.17205 · doi:10.2307/1267351
[19] Knight, K. and W. Fu (2000). Asymptotics for lasso-type estimators., Ann. Statist. 28 , 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[20] Le Cessie, S. and J. C. Van Houwelingen (1992). Ridge estimators in logistic regression., Applied Statistics 41 , 191-201. · Zbl 0825.62593 · doi:10.2307/2347628
[21] McCullagh, P. and J. A. Nelder (1989)., Generalized Linear Models (2nd ed.). London: Chapman and Hall. · Zbl 0744.62098
[22] Murphy, S. A. and A. W. van der Vaart (2000). On profile likelihood., Journal of the American Statistical Association 95 , 449-485. with discussion. · Zbl 0995.62033 · doi:10.2307/2669386
[23] Owen, A. B. (2001)., Empirical Likleihood . Boca Raton, Florida: Chapman and Hall/CRC.
[24] Prentice, R. L. and R. Pyke (1979). Logistic disease incidence models and case-control studies., Biometrika 66 , 403-411. · Zbl 0428.62078 · doi:10.1093/biomet/66.3.403
[25] Qin, J. (1998). Inferences for case-control data and semiparametric two-sample density ratio models., Biometrika 85 , 619-630. · Zbl 0954.62053 · doi:10.1093/biomet/85.3.619
[26] Qin, J. and B. Zhang (1997). A goodness of fit test for the logistic regression model based on case-control data., Biometrika 84 , 609-618. · Zbl 0888.62045 · doi:10.1093/biomet/84.3.609
[27] Santner, T. J. and E. D. Duffy (1989)., Statistical Analysis of Discrete Data . New York: Springer. · Zbl 0702.62005
[28] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society, Series B 58 , 267-288. · Zbl 0850.62538
[29] Vardi, Y. (1982). Nonparametric estimation in the presence of length bias., The Annals of Statistics 10 , 616-620. · Zbl 0491.62034 · doi:10.1214/aos/1176345802
[30] Vardi, Y. (1985). Empirical distribution in selection bias models., The Annals of Statistics 13 , 178-203. · Zbl 0578.62047 · doi:10.1214/aos/1176346585
[31] Zhang, B. (2000)., M -estimation under a two-sample semiparametric model. Scand. J. Statist. 27 , 263-280. · Zbl 0955.62034 · doi:10.1111/1467-9469.00188
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.