×

Preconditioning the Lasso for sign consistency. (English) Zbl 1321.62083

Summary: Sign consistency of the Lasso requires the stringent irrepresentable condition. This paper examines whether preconditioning can circumvent this condition. Let \(\mathbf{X}\in\mathbb{R}^{n\times p}\) and \(Y\in\mathbb{R}^{n}\) satisfy the standard linear regression equation. Instead of computing the Lasso with \((\mathbf{X},Y)\), preconditioning first left multiplies by \(F\in\mathbb{R}^{n\times n}\) and then computes the Lasso with \((F\mathbf{X},FY)\).{ }While others have proposed preconditioning for other purposes, we provide the first results that show \(F\mathbf{X}\) can satisfy the irrepresentable condition even when \(\mathbf{X}\) fails to satisfy the condition. Preconditioning the Lasso creates a new estimator that is sign consistent in a wider variety of settings. Importantly, left multiplying the regression equation by \(F\) does not change \(\beta\), the vector of unknown coefficients. However, left multiplying this equation by \(F\) often inflates the variance of the errors. We propose a class of preconditioners to balance these costs and benefits.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62J05 Linear regression; mixed models
65C60 Computational problems in statistics (MSC2010)
65F08 Preconditioners for iterative methods

Software:

rgl; glmnet; RGL

References:

[1] Adler, D., Nenadic, O., and Zucchini, W., Rgl: A r-library for 3d visualization with opengl. In, Proceedings of the 35th Symposium of the Interface: Computing Science and Statistics, Salt Lake City , volume 35, 2003.
[2] Alter, O., Brown, P. O., and Botstein, D., Singular value decomposition for genome-wide expression data processing and modeling., Proceedings of the National Academy of Sciences , 97(18) :10101-10106, 2000.
[3] Bickel, P. J., Ritov, Y., and Tsybakov, A. B., Simultaneous analysis of lasso and dantzig selector., The Annals of Statistics , 37(4) :1705-1732, 2009. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[4] Bradic, J., Efficient support recovery via weighted maximum-contrast subagging. arXiv :1306.3494, 2013.
[5] Bühlmann, P. and van de Geer, S., Statistics for High-Dimensional Data: Methods, Theory and Applications . Springer-Verlag, New York Incorporated, 2011. · Zbl 1273.62015 · doi:10.1007/978-3-642-20192-9
[6] Candes, E. and Romberg, J., Sparsity and incoherence in compressive sampling., Inverse problems , 23(3):969, 2007. · Zbl 1120.94005 · doi:10.1088/0266-5611/23/3/008
[7] Chatterjee, A. and Lahiri, S. N., Bootstrapping lasso estimators., Journal of the American Statistical Association , 106(494):608-625, 2011. · Zbl 1232.62088 · doi:10.1198/jasa.2011.tm10159
[8] Chikuse, Y., Statistics on Special Manifolds , volume 174. Springer Verlag, 2003. · Zbl 1026.62051
[9] Davidson, K. R. and Szarek, S. J., Local operator theory, random matrices, and Banach spaces. In, Handbook of Banach Spaces , volume 1, pages 317-366. Elsevier, Amsterdan, NL, 2001. · Zbl 1067.46008 · doi:10.1016/S1874-5849(01)80010-3
[10] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R., Least angle regression., The Annals of Statistics , 32(2):407-499, 2004. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[11] Fan, J. and Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association , 96(456) :1348-1360, 2001. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[12] Fan, J. and Lv, J., Sure independence screening for ultrahigh dimensional feature space., Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 70(5):849-911, 2008. · doi:10.1111/j.1467-9868.2008.00674.x
[13] Fan, K. and Hoffman, A. J., Some metric inequalities in the space of matrices. In, Proc. Amer. Math. Soc , volume 6, pages 1-116, 1955. · Zbl 0064.01402 · doi:10.2307/2032662
[14] Friedman, J., Hastie, T., and Tibshirani, R., Regularization paths for generalized linear models via coordinate descent., Journal of statistical software , 33(1):1, 2010.
[15] Huang, J. C. and Jojic, N., Variable selection through correlation sifting. In, Research in Computational Molecular Biology , pages 106-123. Springer, 2011. · Zbl 1161.92335 · doi:10.1007/978-3-540-71681-5
[16] Jia, J. and Rohe, K., Supplement to “Preconditioning the Lasso for sign consistency”. DOI: 10.1214/15-EJS1029SUPP, 2015. · Zbl 1321.62083 · doi:10.1214/15-EJS1029
[17] Meinshausen, N. and Bühlmann, P., High-dimensional graphs and variable selection with the lasso., The Annals of Statistics , 34(3) :1436-1462, 2006. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[18] Paul, D., Bair, E., Hastie, T., and Tibshirani, R., “Preconditioning” for feature selection and regression in high-dimensional problems., The Annals of Statistics , 36(4) :1595-1618, 2008. · Zbl 1142.62022 · doi:10.1214/009053607000000578
[19] Qian, J. and Jia, J., On pattern recovery of the fused lasso. arXiv :1211.5194, 2012. · Zbl 1468.62161
[20] Rauhut, H. and Ward, R., Sparse recovery for spherical harmonic expansions. arXiv :1102.4097, 2011.
[21] Rudelson, M. and Vershynin, R., Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements. In, 40th Annual Conference on Information Sciences and Systems, 2006 , pages 207-212. IEEE, 2006.
[22] Tibshirani, R., Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society. Series B (Methodological) , pages 267-288, 1996. · Zbl 0850.62538
[23] van de Geer, S., High-dimensional generalized linear models and the lasso., The Annals of Statistics , 36(2):614-645, 2008. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[24] van de Geer, S., Bühlmann, P., and Zhou, S., The adaptive and the thresholded lasso for potentially misspecified models (and a lower bound for the lasso)., Electronic Journal of Statistics , 5:688-749, 2011. · Zbl 1274.62471 · doi:10.1214/11-EJS624
[25] Wauthier, F. L., Jojic, N., and Jordan, M., A comparative framework for preconditioned lasso algorithms. In, Advances in Neural Information Processing Systems , pages 1061-1069, 2013.
[26] Xiong, S., Dai, B., and Qian, P. Z. G., Orthogonalizing penalized regression. arXiv :1108.0185, 2011.
[27] Yang, F., Doksum, K., and Tsui, K.-W., Principal component analysis for high dimensional data. PCA is dead. Long live PCA. In, Proc. for Workshop on Persp. on High Dim. Data Analysis II , Montreal, 2014. · Zbl 1325.62119
[28] Zhang, C. H., Nearly unbiased variable selection under minimax concave penalty., The Annals of Statistics , 38(2):894-942, 2010. · Zbl 1183.62120 · doi:10.1214/09-AOS729
[29] Zhao, P. and Yu, B., On model selection consistency of lasso., Journal of Machine Learning Research , 7(2) :2541, 2006. · Zbl 1222.62008
[30] Zou, H., The adaptive lasso and its oracle properties., Journal of the American Statistical Association , 101(476) :1418-1429, 2006. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[31] Zou, H. and Hastie, T., Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 67(2):301-320, 2005. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.