×

A general framework of nonparametric feature selection in high-dimensional data. (English) Zbl 1522.62281

Summary: Nonparametric feature selection for high-dimensional data is an important and challenging problem in the fields of statistics and machine learning. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. Under this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel, which depends on a set of parameters that determines the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters simultaneously. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove the oracle selection property and Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and applications to two real studies.
{© 2022 The International Biometric Society.}

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Allen, G.I. (2013) Automatic feature selection via weighted kernels and regularization. Journal of Computational and Graphical Statistics, 22, 284-299.
[2] Aronszajn, N. (1950) Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337-404. · Zbl 0037.20701
[3] Bartlett, P. (2006) Pattern classification and large margin classifiers. Machine Learning Summer School, National Taiwan University of Science and Technology, Taipei, Taiwan, August 2-3, 2006.
[4] Chiang, A., Beck, J., Yen, H., Tayeh, M., Scheetz, T., Swiderski, R., and et al. (2006) Homozygosity mapping with SNP arrays identifies trim32, an e3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (bbs11)Proceedings of the National Academy of Sciences of the United States of America, 103, 6287-6292.
[5] Fan, J., Feng, Y., and Song, R. (2011) Nonparametric independence screening in sparse ultra‐high dimensional additive models. Journal of the American Statistical Association, 106, 544-557. · Zbl 1232.62064
[6] Fan, J. and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. · Zbl 1073.62547
[7] Fan, J. and Lv, J. (2008) Sure independence screening for ultra‐high dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849-911. · Zbl 1411.62187
[8] Gao, C. and Wu, X. (2012) Kernel support tensor regression. In: 2012 International Workshop on Information and Electronics Engineering (IWIEE). Procedia Engineering, 29, 3986-3990.
[9] Guyon, I. and Elisseeff, A. (2003) An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182. · Zbl 1102.68556
[10] Huang, J., Horowitz, J.L., and Wei, F. (2010) Variable selection in nonparametric additive model. The Annals of Statistics, 38, 2282-2313. · Zbl 1202.62051
[11] Jaakkola, T., Diekhans, M., and Haussler, D. (1999) Using the Fisher kernel method to detect remote protein. ISMB, 99, 149-158.
[12] Li, G., Peng, H., Zhang, J., and Zhu, L. (2012) Robust rank correlation based screening. Annals of Statistics, 40, 1846-1877. · Zbl 1257.62067
[13] Lin, Y. and Zhang, H.H. (2006) Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34, 2272-2297. · Zbl 1106.62041
[14] Ravikumar, P., Lafferty, J., Liu, H., and Wasserman, L. (2009) Sparse additive models. Journal of the Royal Statistical Society: Series B, 101.
[15] Rosasco, L., Villa, S., Mosci, S., Santoro, M., and Verri, A. (2013) Nonparametric sparsity and regularization. Journal of Machine Learning Research, 14, 1665-1714. · Zbl 1317.68183
[16] Scheetz, T.E., Kim, K.‐Y.A., Swiderski, R.E., and Philp, A.R. (2006) Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences U S A, 103, 14429-14434.
[17] Song, L., Smola, A., Gretton, A., Bedo, J. and Borgwardt, K. (2012) Feature selection via dependence maximization. Journal of Machine Learning Research, 13, 1393-1434. · Zbl 1303.68110
[18] Stefanski, L.A., Wu, Y. and White, K. (2014) Variable selection in nonparametric classification via measurement error model selection likelihoods. Journal of the American Statistical Association, 106, 574-589. · Zbl 1367.62219
[19] Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267-288. · Zbl 0850.62538
[20] Trivedi, M., McGrath, P., Fav, M., Parsey, R., Kurian, B., Phillips, M. and et al. (2016) Establishing moderators and biosignatures of antidepressant response in clinical care (EMBARC): rationale and design. Journal of Psychiatric Research, 78, 11-23.
[21] Urbanowicz, R.J., Meeker, M., Cava, W.L. and Olson, R.S. (2018) Relief‐based feature selection: Introduction and review. Journal of Biomedical Informatics, 85, 189-203.
[22] Williams, L.M. (2017) Defining biotypes for depression and anxiety based on large‐scale circuit dysfunction: a theoretical review of the evidence and future directions for clinical translation. Depression and Anxiety, 34, 9-24.
[23] Wright, S. (2015) Coordinate descent algorithms. Mathematical Programming, 151, 3-34. · Zbl 1317.49038
[24] Wu, Y. and Stefanski, L.A. (2015) Automatic structure recovery for additive models. Biometrika, 102, 381-395. · Zbl 1452.62297
[25] Yamada, M., Jitkrittum, W., Sigal, L., Xing, E.P., and Sugiyama, M. (2014) High‐dimensional feature selection by feature‐wise non‐linear Lasso. Neural Computation, 26, 185-207. · Zbl 1410.68328
[26] Yang, L., Lv, S., and Wang, J. (2016) Model‐free variable selection in reproducing kernel Hilbert space. Journal of Machine Learning Research, 17, 1-24. · Zbl 1360.62199
[27] Zhang, C. (2010) Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894-942. · Zbl 1183.62120
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.