×

On the use of minimum penalties in statistical learning. (English) Zbl 07862312

Summary: Modern multivariate machine learning and statistical methodologies estimate parameters of interest while leveraging prior knowledge of the association between outcome variables. The methods that do allow for estimation of relationships do so typically through an error covariance matrix in multivariate regression which does not generalize to other types of models. In this article we proposed the MinPen framework to simultaneously estimate regression coefficients associated with the multivariate regression model and the relationships between outcome variables using common assumptions. The MinPen framework uses a novel penalty based on the minimum function to simultaneously detect and exploit relationships between responses. An iterative algorithm is proposed as a solution to the nonconvex optimization. Theoretical results such as high-dimensional convergence rates, model selection consistency, and a framework for post selection inference are provided. We extend the proposed MinPen framework to other exponential family loss functions, with a specific focus on multiple binomial responses. Tuning parameter selection is also addressed. Finally, simulations and two data examples are presented to show the finite sample properties of this framework. Supplemental material providing proofs, additional simulations, code, and datasets are available online.

MSC:

62-XX Statistics

Software:

glmnet

References:

[1] Anderson, T. W. (1951), “Estimating Linear Restrictions on Regression Coefficients for Multivariate Normal Distributions,”The Annals of Mathematical Statistics, 22, 327-351. DOI: . · Zbl 0043.13902
[2] Bühlmann, P., Rütimann, P., van de Greer, S., and Zhang, C.-H. (2013), “Correlated Variables in Regression: Clustering and Sparse Estimation,”Journal of Statistical Planning and Inference, 143, 1835-1858. DOI: . · Zbl 1278.62103
[3] Chen, L., and Huang, J. Z. (2012), “Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection,”Journal of the American Statistical Association, 107, 1533-1545. DOI: . · Zbl 1258.62075
[4] Chen, L., and Huang, J. Z. (2016), “Sparse Reduced-Rank Regression with Covariance Estimation,”Statistics and Computing, 26, 461-470. · Zbl 1342.62117
[5] Chen, Y., Iyengar, R., and Iyengar, G. (2016), “Modeling Multimodal Continuous Heterogeneity in Conjoint Analysis—A Sparse Learning Approach,”Marketing Science, 36, 140-156. DOI: .
[6] Cook, R. D., Li, B., and Chiaromonte, F. (2010), “Envelope Models for Parsimonious and Efficient Multivariate Linear Regression,” (with Discussion), Statistica Sinica, 20, 927-1010. · Zbl 1259.62059
[7] Cook, R. D., and Zhang, X. (2015), “Foundations for Envelope Models and Methods,”Journal of the American Statistical Association, 110, 599-611. DOI: . · Zbl 1390.62131
[8] Fan, J., and Lv, J. (2008), “Sure Independence Screening for Ultrahigh Dimensional Feature Space,”Journal of the Royal Statistical Society, Series B, 70, 849-911. DOI: . · Zbl 1411.62187
[9] Friedman, J., Hastie, T., and Tibshirani, R. (2008), “Regularized Paths for Genearlized Linear Models via Coordinate Descent,”Journal of Statistial Softwawre, 33, 1-22.
[10] Friedman, J., Hastie, T., and Tibshirani, R. (2010), “Regularization Paths for Generalized Linear Models via Coordinate Descent,”Journal of Statistical Software, 33, 1-22. https://www.jstatsoft.org/v33/i01/.
[11] Fränti, P., and Sieranoja, S. (2019), “How much can k-means be Improved by Using Better Initialization and Repeats?”Pattern Recognition, 93, 95-112. http://www.sciencedirect.com/science/article/pii/S0031320319301608. DOI: .
[12] Hebiri, M., and van de Geer, S. (2011), “The Smooth-Lasso and other \(\ell_1+ \ell_2\)-penalized Methods,” Electronic Journal of Statistics, 5, 1184-1226. · Zbl 1274.62443
[13] Kim, S., and Xing, E. P. (2009), “Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network,”PLOS Genetics, 5, 1-18. DOI: .
[14] Kim, S., and Xing, E. P. (2012), “Tree-Guided Group Lasso for Multi-Response Regression with Structured Sparsity, with an Application to eQTL Mapping,”The Annals of Applied Statistics, 6, 1095-1117. · Zbl 1254.62112
[15] Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016), “Exact Post-Selection Inference, with Application to the Lasso,”The Annals of Statistics, 44, 907-927. DOI: . · Zbl 1341.62061
[16] Lee, W., and Liu, Y. (2012), “Simultaneous Multiple Response Regression and Inverse Covariance Matrix Estimation via Penalized Gaussian Maximum Likelihood,”Journal of Multivariate Analysis, 111, 241-255. DOI: . · Zbl 1259.62043
[17] Li, C., and Li, H. (2008), “Network-Constrained Regularization and Variable Selection of Genomic Data,” Bioinformatics, 24, 1175-1182. DOI: .
[18] Li, C., and Li, H. (2010), “Variable Selection and Regression Analysis for Graph-Structure Covariates with an Application to Genomics,”The Annals of Applied Statistics, 4, 1498-1516. · Zbl 1202.62157
[19] Li, Y., Nan, B., and Zhu, J. (2015), “Multivariate Sparse Group Lasso for Multivariate Multiple Linear Regression with Arbitrary Group Sparsity,” Biometrics, 71, 354-363. DOI: . · Zbl 1390.62285
[20] Negahban, S. N., Ravikumar, P., Wainwright, M. J., and Yu, B. (2012), “A Unified Framework for High-Dimensional Analysis for M-Estimators with Decomposable Regularizers,” Statistical Science, 27, 538-557. DOI: . · Zbl 1331.62350
[21] Price, B. S., Allenbrand, C., and Sherwood, B. (2021), “Dececting Clusters in Multivariate Regression,” WIREs Computational Statistics (to appear). DOI: .
[22] Price, B. S., Molstad, A. J., and Sherwood, B. (2021), “Estimating Multiple Precision Matrices using Cluster Fusion Regularization,” Journal of Computational and Graphical Statistics, 30, 823-834, DOI: . · Zbl 07499920
[23] Price, B. S., and Sherwood, B. (2018), “A Cluster Elastic Net for Multivariate Regression,”Journal of Machine Learning Research, 19, 1-37.
[24] Rothman, A. J., Levina, E., and Zhu, J. (2010), “Sparse Multivariate Regression with Covariance Estimation,”Journal of Computational and Graphical Statistics, 19, 947-962. DOI: .
[25] Shen, X., Pan, W., and Zhu, Y. (2012), “Likelihood-based Selection and Sharp Parameter Estimation,”Journal of the American Statistical Association, 107, 223-232. DOI: . · Zbl 1261.62020
[26] Sun, Q., Zhu, H., Liu, Y., and Ibrahim, J. G. (2015), “SPReM: Sparse Projection Regression Model for High-Dimensional Linear Regression,”Journal of the American Statistical Association, 110, 289-302. DOI: . · Zbl 1373.62359
[27] Sun, W., Wang, J., and Fang, Y. (2012), “Regularized k-means Clustering of High-Dimensional Data and its Asymptotic Consistency,”Electronic Journal of Statistics, 6, 148-167. DOI: . · Zbl 1335.62109
[28] Tibshirani, R. J. (2013), “The Lasso Problem and Uniqueness,”Electronic Journal of Statistics, 7, 1456-1490. DOI: . · Zbl 1337.62173
[29] Velu, R., and Reinsel, G. C. (2013), Multivariate Reduced-Rank Regression: Theory and Applications (Vol. 136), New York: Springer.
[30] Votavova, H., Merkerova, M. D., Fejglova, K., Vasikova, A., Krejcik, Z., Pastorkova, A., Tabashidze, N., Topinka, J. Jr., M. V., Sram, R., and Brdicka, R. (2011), “Transcriptome Alterations in Maternal and Fetal Cells Induced by Tobacco Smoke,”Placenta, 32, 763-770. DOI: .
[31] Wang, N., Tikellis, G., Sun, C., Pezic, A., Wang, L., Wells, J., Cochrane, J., Ponsonby, A.-L., and Dwyer, T. (2014), “The Effect of Maternal Prenatal Smoking and Alcohol Consumption on the Placenta-to-Birth Weight Ratio,”Placenta, 35, 437-441. DOI: .
[32] Witten, D., and Tibshirani, R. (2009), “Covariance Regularized Regression and Classification for High-Dimensional Problems,” Journal of Royal Statistical Society, Series B, 71, 615-636. DOI: . · Zbl 1250.62033
[33] Witten, D. M., Shojaie, A., and Zhang, F. (2014), “The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping,”Technometrics, 56, 112-122. DOI: .
[34] Xu, L., Huang, A., Chen, J., and Chen, E. (2015), “Exploiting Task-Feature Co-clusters in Multi-Task Learning,” in Twenty-NinthAAAI Conference on Artificial Intelligence. DOI: .
[35] Yang, S., Yuan, L., Lai, Y.-C., Shen, X., Wronka, P., and Ye, J. (2012), “Feature Grouping and Selection Over an Undirected Graph,” KDD: Proceedings. International Conference on Knowledge and Discovery, pp. 922-930. DOI: .
[36] Zhao, S., and Shojaie, A. (2016), “A SIgnificance Test for Graph Constrained Estimation,” Biometrics, 72, 484-493. DOI: . · Zbl 1419.62493
[37] Zhou, W., Sherwood, B., Ji, Z., Xue, Y., Du, F., Bai, J., Ying, M., and Ji, H. (2017), “Genome-Wide Prediction of DNase I Hypersensitivity Using Gene Expression,”Nature Communications, 8, 1-17. DOI: .
[38] Zhu, Y., Shen, X., and Pan, W. (2013), “Simultaneous Grouping Pursuit and Feature Selection Over an Undirected Graph,”Journal of the American Statistical Association, 108, 713-725, DOI: . · Zbl 06195973
[39] Zou, H., and Hastie, T. (2005), “Regularization and Variable Selection via the Elastic Net,”Journal of the Royal Statistical Society, Series B, 67, 301-320. DOI: . · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.