×

Monotone splines Lasso. (English) Zbl 1506.62021

Summary: The important problems of variable selection and estimation in nonparametric additive regression models for high-dimensional data are addressed. Several methods have been proposed to model nonlinear relationships when the number of covariates exceeds the number of observations by using spline basis functions and group penalties. Nonlinear monotone effects on the response play a central role in many situations, in particular in medicine and biology. The monotone splines lasso (MS-lasso) is constructed to select variables and estimate effects using monotone splines (\(I\)-splines). The additive components in the model are represented by their \(I\)-spline basis function expansion and the component selection becomes that of selecting the groups of coefficients in the \(I\)-spline basis function expansion. A recent procedure, called cooperative lasso, is used to select sign-coherent groups, i.e. selecting the groups with either exclusively non-negative or non-positive coefficients. This leads to the selection of important covariates that have nonlinear monotone increasing or decreasing effect on the response. An adaptive version of the MS-lasso reduces both the bias and the number of false positive selections considerably. The MS-lasso and the adaptive MS-lasso are compared with other existing methods for variable selection in high dimensions by simulation and the methods are applied to two relevant genomic data sets. Results indicate that the (adaptive) MS-lasso has excellent properties compared to the other methods both in terms of estimation and selection, and can be recommended for high-dimensional monotone regression.

MSC:

62-08 Computational methods for problems pertaining to statistics
62G08 Nonparametric regression and quantile regression
62J07 Ridge regression; shrinkage estimators (Lasso)

References:

[1] Avalos, M.; Grandvalet, Y.; Ambroise, C., Parsimonious additive modeling, Comput. Statist. Data Anal., 51, 2851-2870, (2007) · Zbl 1161.62354
[2] Bacchetti, P., Additive isotonic models, J. Amer. Statist. Assoc., 84, 289-294, (1989)
[3] Barlow, R.; Bartholomew, D.; Bremner, J.; Brunk, H., Statistical inference under order restrictions; the theory and application of isotonic regression, (1972), Wiley New York · Zbl 0246.62038
[4] Bühlmann, P.; van de Geer, S., (Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, (2011), Springer Verlag, Berlin, Heidelberg), URL: http://books.google.no/books?id=S6jYXmh988UC · Zbl 1273.62015
[5] Chiquet, J.; Grandvalet, Y.; Charbonnier, C., Sparsity with sign-coherent groups of variables via the cooperative-lasso, Ann. Appl. Stat., 6, 795-830, (2012) · Zbl 1243.62101
[6] Delecroix, M.; Thomas-Agnan, C., Spline and kernel regression under shape restrictions, (Smoothing and Regression: Approaches, Computations and Applications, (2000), Wiley-Interscience Publication Hoboken, NJ, USA) · Zbl 0986.62025
[7] Dubuc, S.; Deslauriers, G., (Spline Functions and the Theory of Wavelets, CRM Proceedings & Lecture Notes, (1999), American Mathematical Society), URL: http://books.google.no/books?id=Lo3WD-ByHwIC · Zbl 0911.00013
[8] Fang, Z.; Meinshausen, N., Lasso isotone for high-dimensional additive isotonic regression, J. Comput. Graph. Statist., 21, 72-91, (2012)
[9] Hastie, T. J.; Tibshirani, R. J., Generalized additive models, (1990), Chapman & Hall CRC · Zbl 0747.62061
[10] He, X.; Shi, P., Monotone \(B\)-spline smoothing, J. Amer. Statist. Assoc., 93, 643-650, (1998) · Zbl 1127.62322
[11] Huang, J.; Horowitz, J. L.; Wei, F., Variable selection in nonparametric additive models, Ann. Statist., 38, 2282-2313, (2010) · Zbl 1202.62051
[12] Liu, X.; Wang, L.; Liang, H., Estimation and variable selection for semiparametric additive partial linear models, Statist. Sinica, 21, 1225-1248, (2011) · Zbl 1223.62020
[13] Meier, L.; van de Geer, S.; Bühlmann, P., High-dimensional additive modeling, Ann. Statist., 37, 3779-3821, (2009) · Zbl 1360.62186
[14] Meinshausen, N., Relaxed lasso, Comput. Statist. Data Anal., 52, 374-393, (2007) · Zbl 1452.62522
[15] Meyer, M. C., Inference using shape-restricted regression splines, Ann. Appl. Stat., 2, 1013-1033, (2008) · Zbl 1149.62033
[16] Raftery, A. E.; Richardson, S., Model selection for generalized linear models via glib: application to nutrition and breast cancer, (Bayesian Biostatistics, (1996), Marcel Dekker, Inc. New York)
[17] Ramsay, J., Monotone regression splines in action, Statist. Sci., 3, 425-441, (1988)
[18] Ravikumar, P.; Lafferty, J.; Liu, H.; Wasserman, L., Sparse additive models, J. R. Stat. Soc. Ser. B, 71, 1009-1030, (2009), URL: http://dx.doi.org/10.1111/j.1467-9868.2009.00718.x · Zbl 1411.62107
[19] Reppe, S.; Refvem, H.; Gautvik, V. T.; Olstad, O. K.; Hvring, P. I.; Reinholt, F. P.; Holden, M.; Frigessi, A.; Jemtland, R.; Gautvik, K. M., Eight genes are highly associated with BMD variation in postmenopausal Caucasian women, Bone, 46, 604-612, (2010), URL: http://www.sciencedirect.com/science/article/pii/S8756328209020195
[20] Robertson, T.; Wright, F.; Dykstra, R., Order restricted statistical inference, (1988), Wiley New York · Zbl 0645.62028
[21] Solvang, H. K.; Lingjærde, O.; Frigessi, A.; Børresen-Dale, A. L.; Kristensen, V. N., Linear and non-linear dependencies between copy number aberrations and MRNA expression reveal distinct molecular pathways in breast cancer, BMC Bioinform., 12, 197, (2011)
[22] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc., 58, 267-288, (1996) · Zbl 0850.62538
[23] Tibshirani, R. J.; Hoefling, H.; Tibshirani, R., Nearly-isotonic regression, Technometrics, 53, 54-61, (2011)
[24] Tutz, G.; Leitenstorfer, F., Generalized smooth monotonic regression in additive modeling, J. Comput. Graph. Statist., 16, 165-188, (2007)
[25] van de Geer, S.; Bühlmann, P.; Zhou, S., The adaptive and the thresholded lasso for potentially misspecified models (and a lower bound for the lasso), Electron. J. Stat., 5, 688-749, (2011) · Zbl 1274.62471
[26] Wang, L.; Liu, X.; Liang, H.; Carroll, R. J., Estimation and variable selection for generalized additive partial linear models, Ann. Statist., 39, 1827-1851, (2011) · Zbl 1227.62053
[27] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B, 68, 49-67, (2006) · Zbl 1141.62030
[28] Zou, H., The adaptive lasso and its oracle properties, J. Amer. Statist. Assoc., 101, 1418-1429, (2006) · Zbl 1171.62326
[29] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67, 301-320, (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.