×

Feature selection guided by structural information. (English) Zbl 1194.62092

Summary: In generalized linear regression problems with an abundant number of features, lasso-type regularization which imposes an \(\ell ^{1}\)-constraint on the regression coefficients has become a widely established technique. Deficiencies of the lasso in certain scenarios, notably strongly correlated designs, were unmasked when H. Zou and T. Hastie [J. R. Stat. Soc., Ser. B 67, No. 2, 301–320 (2005; Zbl 1069.62054)] introduced the elastic net. We propose to extend the elastic net by admitting general nonnegative quadratic constraints as a second form of regularization. The generalized ridge-type constraint will typically make use of the known association structure of features, for example, by using temporal- or spatial closeness.
We study properties of the resulting “structured elastic net” regression estimation procedure, including basic asymptotics and the issue of model selection consistency. In this vein, we provide an analog to the so-called “irrepresentable condition” which holds for the lasso. Moreover, we outline algorithmic solutions for the structured elastic net within the generalized linear model family. The rationale and the performance of our approach is illustrated by means of simulated and real world data, with a focus on signal regression.

MSC:

62J12 Generalized linear models (logistic models)
62H12 Estimation in multivariate analysis
65C60 Computational problems in statistics (MSC2010)

Citations:

Zbl 1069.62054

Software:

GMRFLib; rda

References:

[1] Belkin, M., Niyogi, P. and Sindwhani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7 2399-2434. · Zbl 1222.68144
[2] Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192-236. · Zbl 0327.60067
[3] Chung, F. (1997). Spectral Graph Theory . AMS Publications. · Zbl 0867.05046
[4] Daumer, M., Thaler, K., Kruis, E., Feneberg, W., Staude, G. and Scholz, M. (2007). Steps towards a miniaturized, robust and autonomous measurement device for the long-term monitoring of patient activity: ActiBelt. Biomed. Tech. 52 149-155.
[5] Donoho, D., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6-18. · Zbl 1288.94017 · doi:10.1109/TIT.2005.860430
[6] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[7] Eilers, P. and Marx, B. (1996). Flexible smoothing with B-splines and penalties (with discussion). Statist. Sci. 11 89-121. · Zbl 0955.62562 · doi:10.1214/ss/1038425655
[8] Eilers, P. and Marx, B. (1999). Generalized linear regression on sampled signals and curves: A P-spline approach. Technometrics 41 1-13.
[9] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[10] Frank, I. and Friedman, J. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109-148. · Zbl 0775.62288 · doi:10.2307/1269656
[11] Friedman, J., Hastie, T., Hoefling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 2 302-332. · Zbl 1378.90064 · doi:10.1214/07-AOAS131
[12] Genkin, A., Lewis, D. and Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics 49 589-616. · doi:10.1198/004017007000000245
[13] Goeman, J. (2007). An efficient algorithm for \ell 1 -penalized estimation. Technical report, Dept. Medical Statistics and Bioinformatics, Univ. Leiden.
[14] Hastie, T., Buja, A. and Tibshirani, R. (1995). Penalized discriminant analysis. Ann. Statist. 23 73-102. · Zbl 0821.62031 · doi:10.1214/aos/1176324456
[15] Hoerl, A. and Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 8 27-51. · Zbl 0202.17205
[16] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[17] Le Cun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W. and Jackel, L. (1989). Backpropagation applied to handwritten zip code recognition. Neural Comput. 2 541-551.
[18] McCullagh, P. and Nelder, J. (1989). Generalized Linear Models . Chapman & Hall, London. · Zbl 0744.62098
[19] Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681-686. · Zbl 1330.62292 · doi:10.1198/016214508000000337
[20] Rosenberg, S. (1997). The Laplacian on a Riemannian Manifold . Cambridge Univ. Press, Cambridge. · Zbl 0868.58074
[21] Rosset, S., Zhu, J. and Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res. 5 941-973. · Zbl 1222.68290
[22] Rue, H. and Held, L. (2001). Gaussian Markov Random Fields . Chapman & Hall/CRC, Boca Raton.
[23] Slawski, M., zu Castell, W. and Tutz, G. (2009). Feature selection guided by structural Information. Technical report, Dept. Statistics, Univ. Munich. Available at . · Zbl 1194.62092
[24] Slawski, M., zu Castell, W. and Tutz, G. (2010). Supplement to “Feature selection guided by structural information.” DOI: . · Zbl 1194.62092
[25] Tibshirani, R. (1996). Regression shrinkage and variable selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 671-686. · Zbl 0850.62538
[26] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B 67 91-108. · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[27] Tutz, G. and Gertheiss, J. (2010). Feature extraction in signal regression: A boosting technique for functional data regression. J. Computat. Graph. Statist. 19 154-174. · doi:10.1198/jcgs.2009.07176
[28] Zhao, P. and Yu, B. (2006). On model selection consistency of the lasso. J. Mach. Learn. Res. 7 2541-2567. · Zbl 1222.62008
[29] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[30] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.