×

Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. (English) Zbl 1395.62122

Let \(X\in \mathbb{R}^{p_1\times p_2}\) be an approximately low-rank matrix and \(Z\in \mathbb{R}^{p_1\times p_2}\) be a small “perturbation” matrix. The goal of the authors is to provide separate rate-optimal perturbation bounds for singular subspaces and rate-sharp bounds for the \(\sin\Theta\) distances between the left and right singular subspaces of \(X\) and \(X+Z\). A comparison with Wedin’s \(\sin\Theta\) theorem is done.
In the sequel, these perturbation bounds are used for low-rank matrix denoising and singular space estimation. It is shown that these new perturbation bounds are particularly powerful when the matrix dimensions differ significantly. Another field of eventual applications discussed in the paper is high-dimensional clustering and canonical correlation analysis. Results of selected simulations illustrate the approach, especially advantage to separate bounds for the left and right singular subspaces over the uniform bounds. The most important parts of the proofs can be found as supplementary material under doi:10.1214/17-AOS1541SUPP.

MSC:

62H12 Estimation in multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis
15B52 Random matrices (algebraic aspects)
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

PMA

References:

[1] Alzahrani, T. and Horadam, K. J. (2016). Community detection in bipartite networks: Algorithms and case studies. In Complex Systems and Networks. Underst. Complex Syst. 25-50. Springer, Heidelberg.
[2] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ. · Zbl 1039.62044
[3] Argyriou, A., Evgeniou, T. and Pontil, M. (2008). Convex multi-task feature learning. Mach. Learn.73 243-272. · Zbl 1470.68073
[4] Azizyan, M., Singh, A. and Wasserman, L. (2013). Minimax theory for high-dimensional Gaussian mixtures with sparse mean separation. In Advances in Neural Information Processing Systems 2139-2147.
[5] Balakrishnan, S., Xu, M., Krishnamurthy, A. and Singh, A. (2011). Noise thresholds for spectral clustering. In Advances in Neural Information Processing Systems 954-962.
[6] Benaych-Georges, F. and Nadakuditi, R. R. (2012). The singular values and vectors of low rank perturbations of large rectangular random matrices. J. Multivariate Anal.111 120-135. · Zbl 1252.15039
[7] Borg, I. and Groenen, P. J. (2005). Modern Multidimensional Scaling: Theory and Applications. Springer, Berlin. · Zbl 1085.62079
[8] Boutsidis, C., Zouzias, A., Mahoney, M. W. and Drineas, P. (2015). Randomized dimensionality reduction for \(k\)-means clustering. IEEE Trans. Inform. Theory 61 1045-1062. · Zbl 1359.62232
[9] Bura, E. and Pfeiffer, R. (2008). On the distribution of the left singular vectors of a random matrix and its applications. Statist. Probab. Lett.78 2275-2280. · Zbl 1146.62011
[10] Cai, T. T., Li, X. and Ma, Z. (2016). Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. Ann. Statist.44 2221-2251. · Zbl 1349.62019
[11] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist.41 3074-3110. · Zbl 1288.62099
[12] Cai, T. T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781-815. · Zbl 1314.62130
[13] Cai, T. T and Zhang, A. (2017). Supplement to “Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics.” DOI:10.1214/17-AOS1541SUPP.
[14] Cai, T., Cai, T. T., Liao, K. and Liu, W. (2015). Large-scale simultaneous testing of cross-covariance matrix with applications to PheWAS. Technical report. · Zbl 1422.62262
[15] Candes, E. J. and Plan, Y. (2010). Matrix completion with noise. Proc. IEEE 98 925-936.
[16] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math.9 717-772. · Zbl 1219.90124
[17] Candès, E. J., Sing-Long, C. A. and Trzasko, J. D. (2013). Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process.61 4643-4657. · Zbl 1393.94187
[18] Candès, E. J. and Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inform. Theory 56 2053-2080. · Zbl 1366.15021
[19] Capitaine, M., Donati-Martin, C. and Féral, D. (2009). The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. Ann. Probab.37 1-47. · Zbl 1163.15026
[20] Chatterjee, S. (2014). Matrix estimation by universal singular value thresholding. Ann. Statist.43 177-214. · Zbl 1308.62038
[21] Chen, M., Gao, C., Ren, Z. and Zhou, H. H. (2013). Sparse CCA via precision adjusted iterative thresholding. Preprint. Available at arXiv:1311.6186. · Zbl 1432.62161
[22] Cho, J., Kim, D. and Rohe, K. (2015). Asymptotic theory for estimating the singular vectors and values of a partially-observed low rank matrix with noise. Preprint. Available at arXiv:1508.05431. · Zbl 1392.62150
[23] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal.7 1-46. · Zbl 0198.47201 · doi:10.1137/0707001
[24] Donoho, D. and Gavish, M. (2014). Minimax risk of matrix denoising by singular value thresholding. Ann. Statist.42 2413-2440. · Zbl 1310.62014
[25] Dopico, F. M. (2000). A note on sin \(Θ\) theorems for singular subspace variations. BIT Numerical Mathematics 40 395-403. · Zbl 0961.65034
[26] Fan, J., Wang, W. and Zhong, Y. (2016). An \(ℓ_∞\) eigenvector perturbation bound and its application to robust covariance estimation. Preprint. Available at arXiv:1603.03516.
[27] Feldman, D., Schmidt, M. and Sohler, C. (2013). Turning big data into tiny data: Constant-size coresets for \(k\)-means, PCA and projective clustering. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms 1434-1453. SIAM, Philadelphia, PA. · Zbl 1421.68219
[28] Gao, C., Ma, Z. and Zhou, H. H. (2014). Sparse CCA: Adaptive estimation and computational barriers. Preprint. Available at arXiv:1409.8565. · Zbl 1421.62073
[29] Gao, C., Ma, Z., Ren, Z. and Zhou, H. H. (2015). Minimax estimation in sparse canonical correlation analysis. Ann. Statist.43 2168-2197. · Zbl 1327.62340 · doi:10.1214/15-AOS1332
[30] Gavish, M. and Donoho, D. L. (2014). The optimal hard threshold for singular values is \(4/\sqrt{3} \). IEEE Trans. Inform. Theory 60 5040-5053. · Zbl 1360.94071
[31] Goldberg, D., Nichols, D., Oki, B. M. and Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Commun. ACM 35 61-70.
[32] Gross, D. (2011). Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform. Theory 57 1548-1566. · Zbl 1366.94103
[33] Hardoon, D. R., Szedmak, S. and Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Comput.16 2639-2664. · Zbl 1062.68134
[34] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1273.62005
[35] Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28 321-377. · Zbl 0015.40705
[36] Jin, J., Ke, Z. T. and Wang, W. (2015). Phase transitions for high dimensional clustering and related problems. Preprint. Available at arXiv:1502.06952. · Zbl 1459.62113
[37] Jin, J. and Wang, W. (2016). Influential features PCA for high dimensional clustering (with discussion). Ann. Statist.44 2323-2359. · Zbl 1359.62249
[38] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc.104 682-693. · Zbl 1388.62174
[39] Keshavan, R. H., Montanari, A. and Oh, S. (2010). Matrix completion from noisy entries. J. Mach. Learn. Res.11 2057-2078. · Zbl 1242.62069
[40] Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist.43 215-237. · Zbl 1308.62041
[41] Liu, Z. and Vandenberghe, L. (2009). Interior-point method for nuclear norm approximation with application to system identification. SIAM J. Matrix Anal. Appl.31 1235-1256. · Zbl 1201.90151
[42] Ma, Z. and Li, X. (2016). Subspace perspective on canonical correlation analysis: Dimension reduction and minimax rates. Preprint. Available at arXiv:1605.03662.
[43] Melamed, D. (2014). Community structures in bipartite networks: A dual-projection approach. PLoS ONE 9 e97823.
[44] O’Rourke, S., Vu, V. and Wang, K. (2013). Random perturbation of low rank matrices: Improving classical bounds. Preprint. Available at arXiv:1311.2657. · Zbl 1380.65076
[45] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist.39 1878-1915. · Zbl 1227.62042
[46] Shabalin, A. A. and Nobel, A. B. (2013). Reconstruction of a low-rank matrix in the presence of Gaussian noise. J. Multivariate Anal.118 67-76. · Zbl 1280.15022
[47] Singer, A. and Cucuringu, M. (2010). Uniqueness of low-rank matrix completion by rigidity theory. SIAM J. Matrix Anal. Appl.31 1621-1641. · Zbl 1221.15038
[48] Stewart, G. W. (1991). Perturbation theory for the singular value decomposition. In SVD and Signal Processing II: Algorithms, Analysis and Applications 99-109. Elsevier, Amsterdam.
[49] Stewart, M. (2006). Perturbation of the SVD in the presence of small singular values. Linear Algebra Appl.419 53-77. · Zbl 1111.65037
[50] Sun, R. and Luo, Z.-Q. (2015). Guaranteed matrix completion via nonconvex factorization. In 2015 IEEE 56 th Annual Symposium on Foundations of Computer Science—FOCS 2015 270-289. IEEE Computer Soc., Los Alamitos, CA.
[51] Tao, T. (2012). Topics in Random Matrix Theory. Graduate Studies in Mathematics 132. Amer. Math. Soc., Providence, RI. · Zbl 1256.15020
[52] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210-268. Cambridge Univ. Press, Cambridge.
[53] von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist.36 555-586. · Zbl 1133.62045
[54] Vu, V. (2011). Singular vectors under random perturbation. Random Structures Algorithms 39 526-538. · Zbl 1242.65069
[55] Wang, R. (2015). Singular vector perturbation under Gaussian noise. SIAM J. Matrix Anal. Appl.36 158-177. · Zbl 1315.15011 · doi:10.1137/130938177
[56] Wedin, P. (1972). Perturbation bounds in connection with singular value decomposition. BIT 12 99-111. · Zbl 0239.15015
[57] Weyl, H. (1912). Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). Math. Ann.71 441-479. · JFM 43.0436.01
[58] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 515-534. · Zbl 1437.62658
[59] Yang, D., Ma, Z. and Buja, A. (2016). Rate optimal denoising of simultaneously sparse and low rank matrices. J. Mach. Learn. Res.17 Paper No. 92, 27. · Zbl 1368.62144
[60] Yu, Y.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.