×

Sign-constrained least squares estimation for high-dimensional regression. (English) Zbl 1327.62422

Summary: Many regularization schemes for high-dimensional regression have been put forward. Most require the choice of a tuning parameter, using model selection criteria or cross-validation. We show that a simple sign-constrained least squares estimation is a very simple and effective regularization technique for a certain class of high-dimensional regression problems. The sign constraint has to be derived via prior knowledge or an initial estimator. The success depends on conditions that are easy to check in practice. A sufficient condition for our results is that most variables with the same sign constraint are positively correlated. For a sparse optimal predictor, a non-asymptotic bound on the \(\ell_{1}\)-error of the regression coefficients is then proven. Without using any further regularization, the regression vector can be estimated consistently as long as \(s^{2}\log(p)/n\rightarrow 0\) for \(n\rightarrow\infty\), where \(s\) is the sparsity of the optimal regression vector, \(p\) the number of variables and \(n\) sample size. The bounds are almost as tight as similar bounds for the Lasso for strongly correlated design despite the fact that the method does not have a tuning parameter and does not require cross-validation. Network tomography is shown to be an application where the necessary conditions for success of sign-constrained least squares are naturally fulfilled and empirical results confirm the effectiveness of the sign constraint for sparse recovery if predictor variables are strongly correlated.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62J05 Linear regression; mixed models

References:

[1] Bellavia, S., Macconi, M., and Morini, B., An interior point newton-like method for non-negative least-squares problems with degenerate solution. Numerical Linear Algebra with Applications , 13:825-846, 2006. · Zbl 1174.65414 · doi:10.1002/nla.502
[2] Belloni, A., Chernozhukov, V., and Wang, L., Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika , 98:791-806, 2011. · Zbl 1228.62083 · doi:10.1093/biomet/asr043
[3] Bickel, P., Ritov, Y., and Tsybakov, A., Simultaneous analysis of Lasso and Dantzig selector. Annals of Statistics , 37:1705-1732, 2009. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[4] Boutsidis, C. and Drineas, P., Random projections for the nonnegative least-squares problem. Linear Algebra and its Applications , 431:760-771, 2009. · Zbl 1167.65032 · doi:10.1016/j.laa.2009.03.026
[5] Breiman, L., Better subset regression using the nonnegative garrote. Technometrics , 37:373-384, 1995. · Zbl 0862.62059 · doi:10.2307/1269730
[6] Bruckstein, A.M., Elad, M., and Zibulevsky, M., On the uniqueness of nonnegative sparse solutions to underdetermined systems of equations. IEEE Transactions on Information Theory , 54:4813-4820, 2008. · Zbl 1319.15007
[7] Bunea, B.F., Tsybakov, A.B., and Wegkamp, M.H., Aggregation for gaussian regression. Annals of Statistics , 35:1674-1697, 2007a. · Zbl 1209.62065 · doi:10.1214/009053606000001587
[8] Bunea, F., Tsybakov, A., and Wegkamp, M., Sparsity oracle inequalities for the lasso. Electronic Journal of Statistics , 1:169-194, 2007b. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[9] Bunea, F., Lederer, J., and She, Y., The group square-root lasso: Theoretical properties and fast algorithms. arXiv preprint , 2013. · Zbl 1364.94113
[10] Candes, E. and Tao, T., The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Annals of Statistics , 35:2312-2351, 2007. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[11] Castro, R., Coates, M., Liang, G., Nowak, R., and Yu, B., Network tomography: Recent developments. Statistical Science , 3:499-517, 2004. · Zbl 1100.62628 · doi:10.1214/088342304000000422
[12] Chen, D. and Plemmons, R.J., Nonnegativity constraints in numerical analysis . World Scientific Press, River Edge, NJ, USA, 2009. · Zbl 1192.65041
[13] Ding, C.H.Q., Li, T., and Jordan, M.I., Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence , 32:45-55, 2010.
[14] Donoho, D.L., Johnstone, I.M., Hoch, J.C., and Stern, A.S., Maximum entropy and the nearly black object. Journal of the Royal Statistical Society. Series B , 54:41-81, 1992. · Zbl 0788.62103
[15] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R., Least angle regression. Annals of Statistics , 32:407-451, 2004. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[16] Guo, Yi and Berman, Mark, A comparison between subset selection and l1 regularisation with an application in spectroscopy. Chemometrics and Intelligent Laboratory Systems , 118:127-138, 2012.
[17] Hoerl, Arthur E. and Kennard, Robert W., Ridge regression: Biased estimation for nonorthogonal problems. Technometrics , 12:55-67, 1970. · Zbl 0202.17205 · doi:10.2307/1267351
[18] Kim, D., Sra, S., and Dhillon, I.S., A new projected quasi-newton approach for the nonnegative least squares problem. Technical report, Department of Computer Science, University of Texas, TR-06-54, 2006.
[19] Kim, H. and Park, H., Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics , 23:1495, 2007.
[20] Lawrence, E., Michailidis, G., and Nair, V.N., Network delay tomography using flexicast experiments. Journal of the Royal Statistical Society: Series B , 68:785-813, 2006. · Zbl 1110.62163 · doi:10.1111/j.1467-9868.2006.00567.x
[21] Lawson, C.L. and Hanson, R.J., Solving least squares problems , volume 15. Society for Industrial Mathematics, 1995. · Zbl 0860.65029
[22] Lee, D.D. and Seung, H.S., Algorithms for non-negative matrix factorization. Advances in neural information processing systems , 13, 2001.
[23] Lee, D.D., Seung, H.S., et al. Learning the parts of objects by non-negative matrix factorization. Nature , 401:788-791, 1999. · Zbl 1369.68285
[24] Meinshausen, N., Relaxed lasso. Computational Statistics and Data Analysis , 52:374-393, 2007. · Zbl 1452.62522
[25] Meinshausen, N. and Yu, B., Lasso-type recovery of sparse representations from high-dimensional data. Annals of Statistics , 7:246-270, 2009. · Zbl 1155.62050 · doi:10.1214/07-AOS582
[26] Slawski, M. and Hein, M., Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization. arXiv preprint , 2012. · Zbl 1280.62086
[27] Slawski, M., Hein, M., and Campus, E., Sparse recovery by thresholded non-negative least squares. Technical report, Department of Computer Science, University of Saarbruecken, 2011.
[28] Tibshirani, R., Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B , 58:267-288, 1996. · Zbl 0850.62538
[29] Van De Geer, S.A., High-dimensional generalized linear models and the lasso. The Annals of Statistics , 36:614-645, 2008. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[30] Van De Geer, S.A. and Bühlmann, P., On the conditions used to prove oracle results for the lasso. Electronic Journal of Statistics , 3:1360-1392, 2009. · Zbl 1327.62425
[31] Wainwright, M.J., Sharp thresholds for high-dimensional and noisy recovery of sparsity. IEEE Transactions on Information Theory , 55:2183-2202, 2009. · Zbl 1367.62220
[32] Waterman, M.S., Least squares with nonnegative regression coefficients. Journal of Statistical Computation and Simulation , 6:67-70, 1977.
[33] Xi, B., Michailidis, G., and Nair, V.N., Estimating network loss rates using active tomography. Journal of the American Statistical Association , 101:1430-1448, 2006. · Zbl 1171.62368 · doi:10.1198/016214506000000366
[34] Yuan, M. and Lin, Y., Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B , 68:49-67, 2006. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[35] Zhang, C.H. and Huang, J., The sparsity and bias of the lasso selection in high-dimensional linear regression. Annals of Statistics , 36:1567-1594, 2008. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[36] Zou, H., The adaptive lasso and its oracle properties. Journal of the American Statistical Association , 101:1418-1429, 2006. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.