×

Graph-based regularization for regression problems with alignment and highly correlated designs. (English) Zbl 1483.62136

Summary: Sparse models for high-dimensional linear regression and machine learning have received substantial attention over the past two decades. Model selection, or determining which features or covariates are the best explanatory variables, is critical to the interpretability of a learned model. Much of the current literature assumes that covariates are only mildly correlated. However, in many modern applications covariates are highly correlated and do not exhibit key properties (such as the restricted eigenvalue condition, restricted isometry property, or other related assumptions). This work considers a high-dimensional regression setting in which a graph governs both correlations among the covariates and the similarity among regression coefficients – meaning there is alignment between the covariates and regression coefficients. Using side information about the strength of correlations among features, we form a graph with edge weights corresponding to pairwise covariances. This graph is used to define a graph total variation regularizer that promotes similar weights for correlated features. This work shows how the proposed graph-based regularization yields mean-squared error guarantees for a broad range of covariance graph structures. These guarantees are optimal for many specific covariance graphs, including block and lattice graphs. Our proposed approach outperforms other methods for highly correlated design in a variety of experiments on synthetic data and real biochemistry data.

MSC:

62J99 Linear inference, regression
62A09 Graphical methods in statistics
62J07 Ridge regression; shrinkage estimators (Lasso)
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

OSCAR

References:

[1] R. F. Alford, A. Leaver-Fay, J. R. Jeliazkov, M. J. O’Meara, F. P. DiMaio, H. Park, M. V. Shapovalov, P. D. Renfrew, V. K. Mulligan, K. Kappel, J. W. Labonte, M. S. Pacella, R. Bonneau, P. Bradley, R. L. Dunbrack, Jr., R. Das, D. Baker, B. Kuhlman, T. Kortemme, and J. J. Gray (2017), The Rosetta All-Atom energy function for macromolecular modeling and design, J. Chem. Theory Comput., 13, pp. 3031-3048, https://doi.org/10.1021/acs.jctc.7b00125.
[2] T. W. Anderson (1955), The integral of a symmetric convex set and some probability inequalities, Proc. of Amer. Math. Soc., 6, pp. 170-176. · Zbl 0066.37402
[3] J. Baik and J. W. Silverstein (2006), Eigenvalues of large sample covariance matrices of spiked populations models, J. Multivariate Anal., 97, pp. 1382-1408. · Zbl 1220.15011
[4] A. G. Barnston and T. M. Smith (1996), Specification and prediction of global surface temperature and precipitation from global SST using CCA, J. Climate, 9, pp. 2660-2697.
[5] P. Bickel and E. Levina (2008a), Regularized estimation of large covariance matrices, Ann. Statist., 36, pp. 199-227. · Zbl 1132.62040
[6] P. Bickel and E. Levina (2008b), Covariance regularization by thresholding, Ann. Statist., 36, pp. 2577-2604. · Zbl 1196.62062
[7] P. Bickel, Y. Ritov, and A. Tsybakov (2009), Simultaneous analysis of lasso and Dantzig selector, Ann. Statist., 37, pp. 1705-1732. · Zbl 1173.62022
[8] M. Bogdan, E. van den Berg, W. Su, and E. Candes (2013), Statistical Estimation and Testing via the Ordered \(\ell_1\) Norm, preprint, https://arxiv.org/abs/1310.1969v1.
[9] H. D. Bondell and B. J. Reich (2008), Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with Oscar, Biometrics, 64, pp. 115-123. · Zbl 1146.62051
[10] S. Boucheron, G. Lugosi, and P. Massart (2013), Concentration Inequalities: A Nonasymptotic Theory of Independence, Oxford University Press, Oxford, UK. · Zbl 1279.60005
[11] P. Bühlmann, P. Rütimann, S. van de Geer, and C. Zhang (2013), Correlated variables in regression: Clustering and sparse estimation, J. Statist. Planning Inference, 143, pp. 1835-1858. · Zbl 1278.62103
[12] T. T. Cai and W. Liu (2011), Adaptive thresholding for sparse covariance matrix estimation, J. Amer. Statist. Assoc., 106, pp. 672-684. · Zbl 1232.62086
[13] T. T. Cai, R. Zhao, and H. H. Zhou (2016), Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation, Electron. J. Statist., 10, pp. 1-59, https://doi.org/10.1214/15-EJS1081. · Zbl 1331.62272
[14] E. Candes and T. Tao (2007), The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\), Ann. Statist., 35, pp. 2313-2351. · Zbl 1139.62019
[15] P. Caoa, X. Liua, H. Liuc, J. Yanga, D. Zhaoa, M. Huang, and O. Zaiane (2018), Generalized fused group lasso regularized multi-task feature learning for predicting cognitive outcomes in Alzheimer’s disease, Comput. Methods Programs Biomed., 162, pp. 19-45.
[16] K. R. Davidson and S. J. Szarek (2001), Local operator theory, random matrices, and Banach spaces, in Handbook of the Geometry of Banach Spaces, Vol. 1, North-Holland, Amsterdam, pp. 317-336. · Zbl 1067.46008
[17] Z. Daye and X. Jeng (2009), Shrinkage and model selection with correlated variables via weighted fusion, Comput. Statist. Data Anal., 53, pp. 1284-1298. · Zbl 1452.62049
[18] T. DelSole and A. Banerjee (2017), Statistical seasonal prediction based on regularized regression, J. Climate, 30, pp. 1345-1361.
[19] D. L. Donoho, M. Gavish, and I. M. Johnstone (2013), Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model, preprint, https://arxiv.org/abs/1311.0851. · Zbl 1403.62099
[20] M. El Anbari and A. Mkhadri (2014), Penalized regression combining the \(\ell_1\) norm and a correlation based penalty, Sankhya, 76, pp. 82-102. · Zbl 1305.62241
[21] M. Figueiredo and R. Nowak (2016), Ordered weighted \(\ell_1\) regularized regression with strongly correlated covariates: Theoretical aspects, in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain, pp. 930-938.
[22] J. E. Geisler, M. L. Blackmon, G. T. Bates, and S. Munoz (1985), Sensitivity of January climate response to the magnitude and position of equatorial Pacific sea surface temperature anomalies, J. Atmospher. Sci., 42, pp. 1037-1049.
[23] F. P. Guengerich (2002), Cytochrome P450 enzymes in the generation of commercial products, Nat. Rev. Drug Discov., 1, pp. 359-366, https://doi.org/10.1038/nrd792.
[24] D. Hallac, J. Leskovec, and S. Boyd (2015), Network lasso: Clustering and optimization in large graphs, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, pp. 387-396.
[25] T. Hastie, R. Tibshirani, and M. Wainwright (2015), Statistical Learning with Sparsity: The Lasso and Generalizations, CRC Press, Boca Raton, FL. · Zbl 1319.68003
[26] M. Hebiri and S. van de Geer (2011), The smooth-lasso and other \(\ell_1+\ell_2\)-penalized methods, Electron. J. Statist., 5, pp. 1184-1226. · Zbl 1274.62443
[27] J. Huang, S. Ma, H. Li, and C.-H. Zhang (2011), The sparse Laplacian shrinkage estimator for high-dimensional regression, Ann. Statist., 39, pp. 2021-2046, https://doi.org/10.1214/11-AOS897. · Zbl 1227.62049
[28] J. Hütter and P. Rigollet (2016), Optimal Rates for Total Variation Denoising, preprint, https://arxiv.org/abs/1603.09388.
[29] J. Jia and K. Rohe (2015), Preconditioning the lasso for sign consistency, Electron. J. Statist., 9, pp. 1150-1172. · Zbl 1321.62083
[30] A. Kalousis, J. Prados, and M. Hilario (2007), Stability of feature selection algorithms: A study on high-dimensional spaces, Knowledge Inform. Syst., 12, pp. 95-116.
[31] M. Ledoux and M. Talagrand (1991), Probability in Banach Spaces: Isoperimetry and Processes, Springer-Verlag, New York. · Zbl 0748.60004
[32] Y. Li, D. A. Drummond, A. M. Sawayama, C. D. Snow, J. D. Bloom, and F. H. Arnold (2007), A diverse family of thermostable cytochrome p \(450\) s created by recombination of stabilizing fragments, Nat. Biotechnol., 25, pp. 1051-1056, https://doi.org/10.1038/nbt1333.
[33] J. Liu, L. Yuan, and J. Ye (2013), Dictionary LASSO: Guaranteed Sparse Recovery under Linear Transformation, preprint, https://arxiv.org/abs/1305.0047.
[34] A. Mamalakis, J.-Y. Yu, J. T. Randerson, A. A. Kouchak, and E. Foufoula-Georgiou (2018), A new interhemispheric teleconnection increases predictability of winter precipitation in Southwestern US, Nat. Commun., 9, 2332.
[35] J. Marial and B. Yu (2013), Supervised feature selection in graphs with path coding penalties and network flows, J. Mach. Learn. Res., 14, pp. 2449-2485. · Zbl 1317.68175
[36] D. Needell and R. Ward (2013a), Near-optimal compressed sensing guarantees for total variation minimization, IEEE Trans. Image Process., 22, pp. 3941-3949. · Zbl 1373.94673
[37] D. Needell and R. Ward (2013b), Stable image reconstruction using total variation minimization, SIAM J. Imaging Sci., 6, pp. 1035-1058, https://doi.org/10.1137/120868281. · Zbl 1370.94042
[38] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu (2012), A unified framework for high-dimensional analysis of \(m\)-estimators with decomposable regularizers, Statist. Sci., 27, pp. 538-557. · Zbl 1331.62350
[39] F. Niehaus, C. Bertoldo, M. Kähler, and G. Antranikian (1999), Extremophiles as a source of novel enzymes for industrial application, Appl. Microbiol. Biotechnol., 51, pp. 711-729.
[40] S. Noschese, L. Pasquini, and L. Reichel (2013), Tridiagonal Toeplitz matrices: Properties and novel applications, Numer. Linear Algebra Appl., 20, pp. 302-326. · Zbl 1289.65082
[41] G. Raskutti and M. Yuan (2015), Convex Regularization for High-Dimensional Tensor Regression, preprint, https://arxiv.org/abs/1512.01215v1.
[42] G. Raskutti, M. J. Wainwright, and B. Yu (2010), Restricted eigenvalue conditions for correlated Gaussian designs, J. Mach. Learn. Res., 11, pp. 2241-2259. · Zbl 1242.62071
[43] G. Raskutti, M. J. Wainwright, and B. Yu (2011), Minimax rates of estimation for high-dimensional linear regression over \(\ell_q\)-balls, IEEE Trans. Inform. Theory, 57, pp. 6976-6994. · Zbl 1365.62276
[44] V. Sadhanala, Y. Wang, and R. Tibshirani (2016), Total variation classes beyond \(1\) d: Minimax rates, and the limitations of linear smoothers, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, pp. 3513-3521.
[45] D. B. Sharma, H. D. Bondell, and H. H. Zhang (2013), Consistent group identification and variable selection in regression with correlated predictors, J. Comput. Graphic. Statist., 22, pp. 319-340, https://doi.org/10.1080/15533174.2012.707849.
[46] J. Sharpnack, A. Singh, and A. Rinaldo (2012), Sparsistency of the edge lasso over graphs, Art. Intell. Statist., pp. 1028-1036.
[47] Y. She (2010), Sparse regression with exact clustering, Electron. J. Statist., 4, pp. 1055-1096. · Zbl 1329.62327
[48] X. Shen and H.-C. Huang (2010), Grouping pursuit through a regularization solution surface, J. Amer. Statist. Assoc., 105, pp. 727-739. · Zbl 1392.62192
[49] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst (2013), The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains, IEEE Signal Process. Mag., 30, pp. 83-98.
[50] G. Strang (2007), Computational Science and Engineering, Wellesley-Cambridge Press, Wellesley, MA. · Zbl 1194.65001
[51] R. Tibshirani (1996), Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58, pp. 267-288. · Zbl 0850.62538
[52] R. Tibshirani and J. Taylor (2011), The solution path of the generalized lasso, Ann. Statist., 39, pp. 1335-1371, https://doi.org/10.1214/11-AOS878. · Zbl 1234.62107
[53] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight (2005), Sparsity and smoothness via the fused lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 67, pp. 91-108. · Zbl 1060.62049
[54] G. Tutz and J. Ulbricht (2009), Penalized regression with correlation-based penalty, Stat. Comput., 19, pp. 239-253.
[55] S. van de Geer (2000), Empirical Processes in M-Estimation, Cambridge University Press, Cambridge, UK. · Zbl 1179.62073
[56] S. van de Geer and P. Buhlmann (2009), On the conditions used to prove oracle results for the lasso, Electron. J. Statist., 3, pp. 1360-1392. · Zbl 1327.62425
[57] R. Vershynin (2018), High-Dimensional Probability, Cambridge University Press, Cambridge, UK. · Zbl 1430.60005
[58] S. Viallon, V. Lambert-Lacroix, H. Hoefling, and F. Picard (2016), On the robustness of the generalized fused lasso to prior specifications, Stat. Comput., 26, pp. 285-301. · Zbl 1342.62123
[59] Y. Wang, J. Sharpnack, A. Smola, and R. Tibshirani (2016), Trend filtering on graphs, J. Mach. Learn. Res., 17, pp. 1-41. · Zbl 1369.62082
[60] F. L. Wauthier, N. Jojic, and M. I. Jordan (2013), A comparative framework for preconditioned lasso algorithms, in Advances in Neural Information Processing Systems, Vol. 26, NeurIPS, San Diego, CA, pp. 1061-1069, https://papers.nips.cc/paper/5104-a-comparative-framework-for-preconditioned-lasso-algorithms.pdf.
[61] D. M. Witten, A. Shojaie, and F. Zhang (2014), The cluster elastic net for high-dimensional regression with unknown variable grouping, Technometrics, 56, pp. 112-122, https://doi.org/10.1080/00401706.2013.810174.
[62] T. T. Wu, Y. F. Chen, T. Hastie, E. Sobel, and K. Lange (2009), Genome-wide association analysis by lasso penalized logistic regression, Bioinformatics, 25, pp. 714-721.
[63] P. Zhao, G. Rocha, and B. Yu (2009), The composite absolute penalties family for grouped and hierarchical variable selection, Ann. Statist., 37, pp. 3468-3497. · Zbl 1369.62164
[64] H. Zou (2006), The adaptive lasso and its oracle properties, J. Amer. Statist. Assoc., 101, pp. 1418-1429. · Zbl 1171.62326
[65] H. Zou and T. Hastie (2005), Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67, pp. 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.