×

Double-matched matrix decomposition for multi-view data. (English) Zbl 07633307

Summary: We consider the problem of extracting joint and individual signals from multi-view data, that is, data collected from different sources on matched samples. While existing methods for multi-view data decomposition explore single matching of data by samples, we focus on double-matched multi-view data (matched by both samples and source features). Our motivating example is the miRNA data collected from both primary tumor and normal tissues of the same subjects; the measurements from two tissues are thus matched both by subjects and by miRNAs. Our proposed double-matched matrix decomposition allows us to simultaneously extract joint and individual signals across subjects, as well as joint and individual signals across miRNAs. Our estimation approach takes advantage of double-matching by formulating a new type of optimization problem with explicit row space and column space constraints, for which we develop an efficient iterative algorithm. Numerical studies indicate that taking advantage of double-matching leads to superior signal estimation performance compared to existing multi-view data decomposition based on single-matching. We apply our method to miRNA data as well as data from the English Premier League soccer matches and find joint and individual multi-view signals that align with domain-specific knowledge. Supplementary materials for this article are available online.

MSC:

62-XX Statistics

References:

[1] Ahn, S. C.; Horenstein, A. R., “Eigenvalue Ratio Test for the Number of Factors,”, Econometrica, 81, 1203-1227 (2013) · Zbl 1274.62403
[2] Argelaguet, R.; Arnol, D.; Bredikhin, D.; Deloro, Y.; Velten, B.; Marioni, J. C.; Stegle, O., “Mofa+: A Statistical Framework for Comprehensive Integration of Multi-Modal Single-Cell Data, Genome Biology, 21, 1 (2020) · doi:10.1186/s13059-020-02015-1
[3] Argelaguet, R.; Velten, B.; Arnol, D.; Dietrich, S.; Zenz, T.; Marioni, J. C.; Buettner, F.; Huber, W.; Stegle, O., “Multi-Omics Factor Analysisa Framework for Unsupervised Integration of Multi-Omics Data Sets, Molecular Systems Biology, 14, e8124 (2018) · doi:10.15252/msb.20178124
[4] Avron, H.; Boutsidis, C.; Toledo, S.; Zouzias, A., Efficient Dimensionality Reduction for Canonical Correlation Analysis, 347-355 (2013), PMLR
[5] Carmichael, I. (2021)
[6] Crainiceanu, C. M.; Caffo, B. S.; Luo, S.; Zipunnikov, V. M.; Punjabi, N. M., “Population Value Decomposition, a Framework for the Analysis of Image Populations, Journal of the American Statistical Association, 106, 775-790 (2011) · Zbl 1229.62088 · doi:10.1198/jasa.2011.ap10089
[7] Efron, B., Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, 1 (2012), Cambridge: Cambridge University Press, Cambridge · Zbl 1256.62007
[8] Feng, Q.; Jiang, M.; Hannig, J.; Marron, J., “Angle-Based Joint and Individual Variation Explained, Journal of Multivariate Analysis, 166, 241-265 (2018) · Zbl 1408.62113 · doi:10.1016/j.jmva.2018.03.008
[9] Gaynanova, I.; Li, G., “Structural Learning and Integrative Decomposition of Multi-View Data, Biometrics, 75, 1121-1132 (2019) · Zbl 1448.62163 · doi:10.1111/biom.13108
[10] Gaynanova, I.; Yuan, D. (2021)
[11] González Rojas, V., “Inter-Battery Factor Analysis via pls: The Missing Data Case,”, Revista Colombiana de Estadística, 39, 247-266 (2016) · Zbl 1435.62217
[12] Hotelling, H.; Kotz, S.; Johnson, N. L., Breakthroughs in Statistics, Relations Between Two Sets of Variates, 162-190 (1992), New York: Springer, New York · Zbl 0758.62001
[13] Jha, S. K.; Yadava, R., “Denoising by Singular Value Decomposition and its Application to Electronic Nose Data Processing, IEEE Sensors Journal, 11, 35-44 (2010) · doi:10.1109/JSEN.2010.2049351
[14] Jung, S.; Ahn, J.; Jeon, Y., “Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem, Journal of Computational and Graphical Statistics, 28, 710-721 (2019) · Zbl 07499088 · doi:10.1080/10618600.2019.1568014
[15] Knyazev, A. V.; Argentati, M. E., “Principal Angles Between Subspaces in an a-based Scalar Product: Algorithms and Perturbation Estimates, SIAM Journal on Scientific Computing, 23, 2008-2040 (2002) · Zbl 1018.65058 · doi:10.1137/S1064827500377332
[16] Lock, E. F.; Hoadley, K. A.; Marron, J. S.; Nobel, A. B., “Joint and Individual Variation Explained (jive) for Integrated Analysis of Multiple Data Types, The Annals of Applied Statistics, 7, 523-542 (2013) · Zbl 1454.62355 · doi:10.1214/12-AOAS597
[17] Löfstedt, T.; Trygg, J., “Onplsa Novel Multiblock Method for the Modelling of Predictive and Orthogonal Variation,”, Journal of Chemometrics, 25, 441-455 (2011)
[18] Mazumder, R.; Hastie, T.; Tibshirani, R., “Spectral Regularization Algorithms for Learning Large Incomplete Matrices,”, The Journal of Machine Learning Research, 11, 2287-2322 (2010) · Zbl 1242.68237
[19] Meng, C.; Kuster, B.; Culhane, A. C.; Gholami, A. M., “A Multivariate Approach to the Integration of Multi-Omics Datasets, BMC Bioinformatics, 15, 1 (2014) · doi:10.1186/1471-2105-15-162
[20] O’Connell, M. J.; Lock, E. F., “R. jive for Exploration of Multi-Source Molecular Data, Bioinformatics, 32, 2877-2879 (2016) · doi:10.1093/bioinformatics/btw324
[21] O’Connell, M. J.; Lock, E. F., “Linked Matrix Factorization, Biometrics, 75, 582-592 (2019) · Zbl 1436.62611
[22] Onatski, A., “Determining the Number of Factors from Empirical Distribution of Eigenvalues, The Review of Economics and Statistics, 92, 1004-1016 (2010) · doi:10.1162/REST_a_00043
[23] Owen, A. B.; Perry, P. O., “Bi-Cross-Validation of the SVD and Nonnegative Matrix Factorization, The Annals of Applied Statistics, 3, 564-594 (2009) · Zbl 1166.62047 · doi:10.1214/08-AOAS227
[24] Park, J. Y.; Lock, E. F., “Integrative Factorization of Bidimensionally Linked Matrices, Biometrics, 76, 61-74 (2020) · Zbl 1451.62137 · doi:10.1111/biom.13141
[25] Risk, B. B.; Gaynanova, I., “Simultaneous Non-Gaussian Component Analysis (Sing) for Data Integration in Neuroimaging, The Annals of Applied Statistics, 15, 1431-1454 (2021) · Zbl 1478.62341 · doi:10.1214/21-AOAS1466
[26] Rosipal, R.; Krämer, N., International Statistical and Optimization Perspectives Workshop“ Subspace, Latent Structure and Feature Selection”, “Overview and Recent Advances in Partial Least Squares,”, 34-51 (2005), Springer
[27] Schouteden, M.; Van Deun, K.; Wilderjans, T. F.; Van Mechelen, I., “Performing Disco-sca to Search for Distinctive and Common Information in Linked Data, Behavior Research Methods, 46, 576-587 (2014) · doi:10.3758/s13428-013-0374-6
[28] Shen, H.; Huang, J. Z., “Sparse Principal Component Analysis via Regularized Low Rank Matrix Approximation, Journal of Multivariate Analysis, 99, 1015-1034 (2008) · Zbl 1141.62049 · doi:10.1016/j.jmva.2007.06.007
[29] Shu, H.; Wang, X.; Zhu, H., “D-cca: A Decomposition-Based Canonical Correlation Analysis for High-Dimensional Datasets, Journal of the American Statistical Association, 115, 292-306 (2020) · Zbl 1437.62211 · doi:10.1080/01621459.2018.1543599
[30] Udell, M.; Townsend, A., “Why are Big Data Matrices Approximately Low Rank?, SIAM Journal on Mathematics of Data Science, 1, 144-160 (2019) · Zbl 1513.68057 · doi:10.1137/18M1183480
[31] Van Deun, K.; Smilde, A. K., “A Structured Overview of Simultaneous Component Based Data Integration, BMC Bioinformatics, 10, 246 (2009) · doi:10.1186/1471-2105-10-246
[32] Wei, L.; Jin, Z.; Yang, S.; Xu, Y.; Zhu, Y.; Ji, Y., “Tcga-assembler 2: Software Pipeline for Retrieval and Processing of tcga/cptac Data, Bioinformatics, 34, 1615-1617 (2018) · doi:10.1093/bioinformatics/btx812
[33] Wold, S.; Geladi, P.; Esbensen, K.; Öhman, J., “Multi-Way Principal Components-and pls-analysis, Journal of Chemometrics, 1, 41-56 (1987) · doi:10.1002/cem.1180010107
[34] Yang, Z.; Michailidis, G., “A Non-Negative Matrix Factorization Method for Detecting Modules in Heterogeneous Omics Multi-Modal Data,”, Bioinformatics, 32, 1-8 (2016)
[35] Zhou, G.; Cichocki, A.; Zhang, Y.; Mandic, D. P., “Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction, IEEE Transactions on Neural Networks and Learning Systems, 27, 2426-2439 (2016) · doi:10.1109/TNNLS.2015.2487364
[36] Zhou, G.; Zhao, Q.; Zhang, Y.; Adal i, T.; Xie, S.; Cichocki, A., “Linked Component Analysis from Matrices to High-Order Tensors: Applications to Biomedical Data, Proceedings of the IEEE, 104, 310-331 (2016) · doi:10.1109/JPROC.2015.2474704
[37] Zhu, M.; Ghodsi, A., “Automatic Dimensionality Selection from the Scree Plot via the Use of Profile Likelihood,”, Computational Statistics & Data Analysis, 51, 918-930 (2006) · Zbl 1157.62429
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.