Abstract
The existing methods for feature screening focus mainly on the mean function of regression models. The variance function, however, plays an important role in statistical theory and application. We thus investigate feature screening for mean and variance functions with multiple-index framework in high dimensional regression models. Notice that some information about predictors can be known in advance from previous investigations and experience, for example, a certain set of predictors is related to the response. Based on the conditional information, together with empirical likelihood, we propose conditional feature screening procedures. Our methods can consistently estimate the sets of active predictors in the mean and variance functions. It is interesting that the proposed screening procedures can avoid estimating the unknown link functions in the mean and variance functions, and moreover, can work well in the case of high correlation among the predictors without iterative algorithm. Therefore, our proposal is of computational simplicity. Furthermore, as a conditional method, our method is robust to the choice of the conditional set. The theoretical results reveal that the proposed procedures have sure screening properties. The attractive finite sample performance of our method is illustrated in simulations and a real data application.
Similar content being viewed by others
References
Barut E, Fan J, Verhasselt A (2016) Conditional sure independence screening. J Am Stat Assoc 111:1266–1277
Box G, Hill W (1974) Correcting inhomogeneity of variance with power transformation weighting. Technometrics 16:385–389
Box G, Meyer D (1986) An analysis for unreplicated fractional factorials. Technometrics 28:11–18
Cai TT, Wang L (2008) Adaptive variance function estimation in heteroscedastic nonparametric regression. Ann Stat 36:2025–2054
Carroll RJ (2003) Variances are not always nuisance parameters. Biometrics 59:211–220
Carroll RJ, Fan J, Gijbels I, Wand MP (1997) Generalized partially linear single-index models. J Am Stat Assoc 92:477–489
Chang J, Tang CY, Wu Y (2013a) Marginal empirical likelihood and sure independence feature screening. Ann Stat 41:2123–2148
Chang J, Tang CY, Wu Y (2013b) Supplement to “Marginal empirical likelihood and sure independence feature screening”. Ann Stat. https://doi.org/10.1214/13-AOS1139SUPP
Chang J, Tang CY, Wu Y (2016) Local independence feature screening for nonparametric and semiparametric models by marginal empirical likelihood. Ann Stat 44:515–539
Davidian M, Carroll RJ (1987) Variance function estimation. J Am Stat Assoc 82:1079–1091
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol 70:849–911
Fan J, Song R (2010) Sure independence screening in generalized linear models with np-dimensionality. Ann Stat 38:3567–3604
Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Hall P, Li K-C (1993) On almost linearity of low-dimensional projections from high-dimensional data. Ann Stat 21:867–889
Härdle W, Liang H, Gao J (2000) Partially linear models. Springer, Heidelberg
Härdle W, Stoker TM (1989) Investigating smooth multiple regression by the method of average derivatives. J Am Stat Assoc 84:986–995
He X, Wang L, Hong HG (2013) Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Stat 41:342–369
Hu Q, Lin L (2017) Conditional sure independence screening by condtional marginal empirical likelihood. Ann Inst Stat Math 69:63–96
Li G, Peng H, Zhang J, Zhu L (2012) Robust rank correlation based screening. Ann Stat 40:1846–1877
Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Stat Assoc 107:1129–1139
Lian Heng, Liang Hua, Carroll Raymond J (2015) Variance function partially linear single-index models. J R Stat Soc Ser B Stat Methodol 77:171–194
Liu Y, Chen J (2010) Adjusted empirical likelihood with high-order precision. Ann Stat 38:1341–1362
Mai Q, Zou H (2015) The fused Kolmogorov filter: a nonparametric model-free screening method. Ann Stat 43:1471–1497
McCullagh P, Nelder JA (1989) Generalized linear models. Chapman and Hall/CRC, New York
Owen AB (2001) Empirical likelihood. Chapman and Hall/CRC, New York
Qin J, Lawless J (1994) Empirical likelihood and general estimating equations. Ann Stat 22:300–325
Western B, Bloome D (2009) Variance function regressions for studying inequality. Sociol Methodol 39:293–326
Xia Y (2006) Asymptotic distributions for two estimators of the single-index model. Econ Theory 22:1112–1137
Zhu LP, Li L, Li R, Zhu LX (2011) Model-free feature screening for ultrahigh-dimensional data. J Am Stat Assoc 106:1464–1475
Acknowledgements
We thank the editor, the AE and reviewers for their constructive comments, which have led to a dramatic improvement of the earlier version of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Qinqin Hu’s research was partially supported by National Natural Science Foundation of China (Grant Nos. 11601283, 11571204 and 11501314) and China Postdoctoral Science Foundation (Grant No. 2015M582067). Lu Lin’s research was partially supported by National Natural Science Foundation of China (Grant Nos. 11571204 and 11231005).
Rights and permissions
About this article
Cite this article
Hu, Q., Lin, L. Conditional feature screening for mean and variance functions in models with multiple-index structure. Metrika 81, 357–393 (2018). https://doi.org/10.1007/s00184-018-0646-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-018-0646-3