×

Additive partially linear models for ultra-high-dimensional regression. (English) Zbl 07851110

Summary: We consider a semiparametric additive partially linear regression model (APLM) for analysing ultra-high-dimensional data where both the number of linear components and the number of non-linear components can be much larger than the sample size. We propose a two-step approach for estimation, selection, and simultaneous inference of the components in the APLM. In the first step, the non-linear additive components are approximated using polynomial spline basis functions, and a doubly penalized procedure is proposed to select nonzero linear and non-linear components based on adaptive lasso. In the second step, local linear smoothing is then applied to the data with the selected variables to obtain the asymptotic distribution of the estimators of the nonparametric functions of interest. The proposed method selects the correct model with probability approaching one under regularity conditions. The estimators of both the linear part and the non-linear part are consistent and asymptotically normal, which enables us to construct confidence intervals and make inferences about the regression coefficients and the component functions. The performance of the method is evaluated by simulation studies. The proposed method is also applied to a dataset on the shoot apical meristem of maize genotypes.
{© 2019 John Wiley & Sons, Ltd.}

MSC:

62-XX Statistics

Software:

grpreg
Full Text: DOI

References:

[1] Breheny, P. (2016). grpreg: Regularization paths for regression models with grouped covariates. R package version 3.0‐2. Available at “https://cran.r-project.org/web/packages/grpreg/index.html”.
[2] Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759-771. · Zbl 1437.62415
[3] Ding, H., Claeskens, G., & Jansen, M. (2011). Variable selection in partially linear wavelet models. Statistical Modelling, 5(11), 409-427. · Zbl 1420.62306
[4] Fan, J., Feng, Y., & Song, R. (2011). Nonparametric independence screening in sparse ultra‐high‐dimensional additive models. Journal of the American Statistical Association, 106(494), 544-557. · Zbl 1232.62064
[5] Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348-1360. · Zbl 1073.62547
[6] Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849-911. · Zbl 1411.62187
[7] Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models, Vol. 43. New York: Chapman and Hall. · Zbl 0747.62061
[8] Huang, J., Breheny, P., & Ma, S. (2012). A selective review of group selection in high‐dimensional models. Statistical Science, 27(4), 481-499. · Zbl 1331.62347
[9] Huang, J., Horowitz, J. L., & Wei, F. (2010). Variable selection in nonparametric additive models. The Annals of Statistics, 38(4), 2282-2313. · Zbl 1202.62051
[10] Kai, B., Li, R., & Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying‐coefficient partially linear models. The Annals of Statistics, 39(1), 305-332. · Zbl 1209.62074
[11] Lee, E. R., Noh, H., & Park, B. U. (2014). Model selection via Bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109(505), 216-229. · Zbl 1367.62122
[12] Leiboff, S., Li, X., Hu, H.‐C., Todt, N., Yang, J., Li, X., ..., & Scanlon, M. J. (2015). Genetic control of morphometric diversity in the maize shoot apical meristem. Nature Communications, 6, 8974-9974.
[13] Lian, H., Liang, H., & Ruppert, D. (2015). Separation of covariates into nonparametric and parametric parts in high‐dimensional partially linear additive models. Statistica Sinica, 25, 591-607. · Zbl 1534.62053
[14] Liang, H., & Li, R. (2009). Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association, 104(485), 234-248. · Zbl 1388.62208
[15] Lin, H.‐y., Liu, Q., Li, X., Yang, J., Liu, S., Huang, Y., ..., & Schnable, P. S. (2017). Substantial contribution of genetic variation in the expression of transcription factors to phenotypic variation revealed by erd‐gwas. Genome biology, 18(1), 192.
[16] Linton, O. B. (1997). Miscellanea efficient estimation of additive nonparametric regression models. Biometrika, 84(2), 469-473. · Zbl 0882.62038
[17] Linton, O. B., & Nielsen, J. P. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika, 82(1), 93-100. · Zbl 0823.62036
[18] Liu, X., Wang, L., & Liang, H. (2011). Estimation and variable selection for semiparametric additive partial linear models. Statistica Sinica, 21(3), 1225-1248. · Zbl 1223.62020
[19] Mammen, E., Linton, O., & Nielsen, J (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. The Annals of Statistics, 27(5), 1443-1490. · Zbl 0986.62028
[20] Marx, B. D., & Eilers, P. H. (1998). Direct generalized additive modeling with penalized likelihood. Computational Statistics and Data Analysis, 28(2), 193-209. · Zbl 1042.62580
[21] Opsomer, J. D., & Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression. The Annals of Statistics, 25(1), 186-211. · Zbl 0869.62026
[22] Sherwood, B., & Wang, L. (2016). Partially linear additive quantile regression in ultra‐high dimension. The Annals of Statistics, 44(1), 288-317. · Zbl 1331.62264
[23] Sperlich, S., Tjøstheim, D., & Yang, L. (2002). Nonparametric estimation and testing of interaction in additive models. Econometric Theory, 18(02), 197-251. · Zbl 1109.62310
[24] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58(1), 267-288. · Zbl 0850.62538
[25] Wang, L., Liu, X., Liang, H., & Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. The Annals of Statistics, 39(4), 1827-1851. · Zbl 1227.62053
[26] Wang, L., Xue, L., Qu, A., & Liang, H. (2014). Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates. The Annals of Statistics, 42(2), 592-624. · Zbl 1309.62077
[27] Wang, L., & Yang, L. (2007). Spline‐backfitted kernel smoothing of nonlinear additive autoregression model. The Annals of Statistics, 35(6), 2474-2503. · Zbl 1129.62038
[28] Xie, H., & Huang, J. (2009). Scad‐penalized regression in high‐dimensional partially linear models. The Annals of Statistics, 37(2), 673-696. · Zbl 1162.62037
[29] Xue, L., & Yang, L. (2006). Additive coefficient modeling via polynomial spline. Statistica Sinica, 16(4), 1423-1446. · Zbl 1109.62030
[30] Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49-67. · Zbl 1141.62030
[31] Zhang, H. H., Cheng, G., & Liu, Y. (2011). Linear or nonlinear? Automatic structure discovery for partially linear models. Journal of American Statistical Association, 106, 1099-1112. · Zbl 1229.62051
[32] Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418-1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.