×

Generalized co-sparse factor regression. (English) Zbl 1510.62067

Summary: Multivariate regression techniques are commonly applied to explore the associations between large numbers of outcomes and predictors. In real-world applications, the outcomes are often of mixed types, including continuous measurements, binary indicators, and counts, and the observations may also be incomplete. Building upon the recent advances in mixed-outcome modeling and sparse matrix factorization, generalized co-sparse factor regression (GOFAR) is proposed, which utilizes the flexible vector generalized linear model framework and encodes the outcome dependency through a sparse singular value decomposition (SSVD) of the integrated natural parameter matrix. To avoid the estimation of the notoriously difficult joint SSVD, GOFAR proposes both sequential and parallel unit-rank estimation procedures. By combining the ideas of alternating convex search and majorization-minimization, an efficient algorithm is developed to solve the sparse unit-rank problem and implemented in the R package gofar. Extensive simulation studies and two real-world applications demonstrate the effectiveness of the proposed approach.

MSC:

62-08 Computational methods for problems pertaining to statistics
62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
62J12 Generalized linear models (logistic models)
62H12 Estimation in multivariate analysis

Software:

R; SOFAR; glmnet

References:

[1] Anderson, T. W., Estimating linear restrictions on regression coefficients for multivariate normal distributions, Ann. Math. Stat., 22, 327-351 (1951) · Zbl 0043.13902
[2] Brown, P. J.; Zidek, J. V., Adaptive multivariate ridge regression, Ann. Statist., 8, 64-74 (1980) · Zbl 0425.62053
[3] Bunea, F.; She, Y.; Wegkamp, M., Optimal selection of reduced rank estimators of high-dimensional matrices, Ann. Statist., 39, 1282-1309 (2011) · Zbl 1216.62086
[4] Bunea, F.; She, Y.; Wegkamp, M., Joint variable and rank selection for parsimonious estimation of high dimensional matrices, Ann. Statist., 40, 2359-2388 (2012) · Zbl 1373.62246
[5] Candès, E. J.; Recht, B., Exact matrix completion via convex optimization, Found. Comput. Math., 9, 717 (2009) · Zbl 1219.90124
[6] Chen, K.; Chan, K. S.; Stenseth, N. C., Reduced rank stochastic regression with a sparse singular value decomposition, J. R. Stat. Soc. Ser. B Stat. Methodol., 74, 203-221 (2012) · Zbl 1411.62182
[7] Chen, K.; Dong, H.; Chan, K. S., Reduced rank regression via adaptive nuclear norm penalization, Biometrika, 100, 901-920 (2013) · Zbl 1279.62115
[8] Chen, K.; Dong, R.; Xu, W.; Zheng, Z., Statistically guided divide-and-conquer for sparse factorization of large matrix (2020), arXiv preprint arXiv:2003.07898
[9] Chen, L.; Huang, J. Z., Sparse reduced-rank regression for simultaneous dimension reduction and variable selection, J. Amer. Statist. Assoc., 107, 1533-1545 (2012) · Zbl 1258.62075
[10] Cox, D. R.; Wermuth, N., Response models for mixed binary and quantitative variables, Biometrika, 79, 441-461 (1992) · Zbl 0766.62042
[11] Cupples, L. A.; Arruda, H. T.; Benjamin, E. J.; D’Agostino, R. B.; Demissie, S.; DeStefano, A. L.; Dupuis, J.; Falls, K. M.; Fox, C. S.; Gottlieb, D. J., The framingham heart study 100k snp genome-wide association study resource: overview of 17 phenotype working group reports, BioMed Central Med. Genet., 8, S1 (2007)
[12] Fitzmaurice, G. M.; Laird, N. M., Regression models for a bivariate discrete and continuous outcome with clustering, J. Amer. Statist. Assoc., 90, 845-852 (1995) · Zbl 0851.62083
[13] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33, 1 (2010)
[14] Gorski, J.; Pfeuffer, F.; Klamroth, K., Biconvex sets and optimization with biconvex functions: a survey and extensions, Math. Methods Oper. Res., 66, 3, 373-407 (2007) · Zbl 1146.90495
[15] He, L.; Chen, K.; Xu, W.; Zhou, J.; Wang, F., Boosted sparse and low-rank tensor regression, (Advances in Neural Information Processing Systems (2018)), 1009-1018
[16] Hoerl, A. E.; Kennard, R. W., Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12, 55-67 (1970) · Zbl 0202.17205
[17] Jolliffe, I. T., A note on the use of principal components in regression, J. R. Stat. Soc. Ser. C. Appl. Stat., 31, 300-303 (1982)
[18] Jørgensen, B., Exponential dispersion models, J. R. Stat. Soc. Ser. B Stat. Methodol., 49, 127-145 (1987) · Zbl 0662.62078
[19] Koltchinskii, V.; Lounici, K.; Tsybakov, A., Nuclear norm penalization and optimal rates for noisy low rank matrix completion, Ann. Statist., 39, 2302-2329 (2011) · Zbl 1231.62097
[20] Luo, C.; Liang, J.; Li, G.; Wang, F.; Zhang, C.; Dey, D. K.; Chen, K., Leveraging mixed and incomplete outcomes via reduced-rank modeling, J. Multivariate Anal., 167, 378-394 (2018) · Zbl 1395.62135
[21] Ma, Z.; Sun, T., Adaptive sparse reduced-rank regression (2014), arXiv preprint arXiv:1403.1922
[22] Mishra, A.; Dey, D. K.; Chen, K., Sequential co-sparse factor regression, J. Comput. Graph. Statist., 26, 814-825 (2017)
[23] Negahban, S.; Wainwright, M. J., Estimation of (near) low-rank matrices with noise and high-dimensional scaling, Ann. Statist., 39, 1069-1097 (2011) · Zbl 1216.62090
[24] Obozinski, G.; Wainwright, M. J.; Jordan, M. I., Support union recovery in high-dimensional multivariate regression, Ann. Statist., 39, 1-47 (2011) · Zbl 1373.62372
[25] Peng, J.; Zhu, J.; Bergamaschi, A.; Han, W.; Noh, D. Y.; Pollack, J. R.; Wang, P., Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer, Ann. Appl. Stat., 4, 53 (2010) · Zbl 1189.62174
[26] Prentice, R.; Zhao, L., Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses, Biometrics, 47, 825 (1991) · Zbl 0729.62560
[27] R Core Team, R., R: A Language and Environment for Statistical Computing (2019), R Foundation for Statistical Computing: R Foundation for Statistical Computing Vienna, Austria, URL https://www.R-project.org/
[28] Razaviyayn, M.; Hong, M.; Luo, Z. Q., A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM J. Optim., 23, 1126-1153 (2013) · Zbl 1273.90123
[29] She, Y., An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors, Comput. Statist. Data Anal., 56, 2976-2990 (2012) · Zbl 1255.62209
[30] She, Y., Reduced rank multivariate generalized linear models for feature extraction, Stat. Interface, 6, 197-209 (2013) · Zbl 1327.62431
[31] Stanziano, D. C.; Whitehurst, M.; Graham, P.; Roos, B. A., A review of selected longitudinal studies on aging: past findings and future directions, J. Am. Geriat. Soc., 58, S292-S297 (2010)
[32] Stone, M., Cross-validatory choice and assessment of statistical predictions, J. R. Stat. Soc. Ser. B Stat. Methodol., 36, 111-133 (1974) · Zbl 0308.62063
[33] Tibshirani, R. J., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 267-288 (1996) · Zbl 0850.62538
[34] Turlach, B. A.; Venables, W. N.; Wright, S. J., Simultaneous variable selection, Technometrics, 47, 349-363 (2005)
[35] Turnbull, D.; Barrington, L.; Torres, D.; Lanckriet, G., Towards musical query-by-semantic-description using the cal500 data set, (Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 439-446
[36] Uematsu, Y.; Fan, Y.; Chen, K.; Lv, J.; Lin, W., Sofar: large-scale association network learning, IEEE Trans. Inform. Theory, 65, 4924-4939 (2019) · Zbl 1432.68402
[37] Velu, R.; Reinsel, G. C., Multivariate Reduced-Rank Regression: Theory and Applications. Vol. 136 (2013), Springer Science & Business Media
[38] Yee, T. W.; Hastie, T. J., Reduced-rank vector generalized linear models, Stat. Model., 3, 15-41 (2003) · Zbl 1195.62123
[39] Yuan, M.; Ekici, A.; Lu, Z.; Monteiro, R., Dimension reduction and coefficient estimation in multivariate linear regression, J. R. Stat. Soc. Ser. B Stat. Methodol., 69, 329-346 (2007) · Zbl 07555355
[40] Zhao, L. P.; Prentice, R. L.; Self, S. G., Multivariate mean parameter estimation by using a partly exponential model, J. R. Stat. Soc. Ser. B Stat. Methodol., 54, 805-811 (1992)
[41] Zou, H.; Hastie, T. J., Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B Stat. Methodol., 67, 301-320 (2005) · Zbl 1069.62054
[42] Zou, H.; Zhang, H. H., On the adaptive elastic-net with a diverging number of parameters, Ann. Statist., 37, 1733 (2009) · Zbl 1168.62064
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.