×

Low-rank matrix recovery under heavy-tailed errors. (English) Zbl 07874422

Summary: This paper proposes convex relaxation based robust methods to recover approximately low-rank matrices in the presence of heavy-tailed and asymmetric errors, allowing for heteroscedasticity. We focus on three archetypal applications in matrix recovery: matrix compressed sensing, matrix completion and multitask regression. Statistically, we provide sub-Gaussian-type deviation bounds when the noise variables only have bounded variances in each aforementioned setting. Improving upon the earlier results in Fan, Wang and Zhu (Ann. Statist. 49 (2021) 1239-1266), the convergence rates of our estimators are proportional to the noise scale under matrix sensing and multitask regression settings, and thus diminish to 0 in the noiseless case. Computationally, we propose a matrix version of the local adaptive majorize-minimization algorithm, which is much faster than the alternating direction method of multiplier used in previous work and is scalable to large datasets. Numerical experiments demonstrate the advantage of our methods over their non-robust counterparts and corroborate the theoretical findings that the convergence rates are proportional to the noise scale.

MSC:

62-XX Statistics
90-XX Operations research, mathematical programming

Software:

SDPLR

References:

[1] Argyriou, A., Evgeniou, T. and Pontil, M. (2008). Convex multi-task feature learning. Mach. Learn. 73 243-272. · Zbl 1470.68073
[2] Burer, S. and Monteiro, R.D.C. (2003). A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95 329-357. 10.1007/s10107-002-0352-8 MathSciNet: MR1976484 · Zbl 1030.90077
[3] Cai, J.-F., Candès, E.J. and Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20 1956-1982. 10.1137/080738970 MathSciNet: MR2600248 · Zbl 1201.90155
[4] Candès, E.J. and Plan, Y. (2009). Matrix completion with noise. Proc. IEEE 98 925-936.
[5] Candès, E.J. and Plan, Y. (2011). Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inf. Theory 57 2342-2359. 10.1109/TIT.2011.2111771 MathSciNet: MR2809094 · Zbl 1366.90160
[6] Candès, E.J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717-772. 10.1007/s10208-009-9045-5 MathSciNet: MR2565240 · Zbl 1219.90124
[7] Chen, J., Liu, D. and Li, X. (2020). Nonconvex rectangular matrix completion via gradient descent without \(\ell_{2 , \operatorname{\infty}}\) regularization. IEEE Trans. Inf. Theory 66 5806-5841. 10.1109/TIT.2020.2992234 MathSciNet: MR4158648 · Zbl 1448.90078
[8] Chen, Y., Jalali, A., Sanghavi, S. and Caramanis, C. (2013). Low-rank matrix recovery from errors and erasures. IEEE Trans. Inf. Theory 59 4324-4337.
[9] Dalalyan, A. and Thompson, P. (2019). Outlier-robust estimation of a sparse linear model using \(\ell_1\)-penalized Huber’s \(M\)-estimator. In Advances in Neural Information Processing Systems 32 13188-13198.
[10] Elsener, A. and van de Geer, S. (2018). Robust low-rank matrix estimation. Ann. Statist. 46 3481-3509. 10.1214/17-AOS1666 MathSciNet: MR3852659 · Zbl 1412.62068
[11] Fan, J., Li, Q. and Wang, Y. (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 247-265. 10.1111/rssb.12166 MathSciNet: MR3597972 · Zbl 1414.62178
[12] Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error. Ann. Statist. 46 814-841. 10.1214/17-AOS1568 MathSciNet: MR3782385 · Zbl 1392.62215
[13] Fan, J., Wang, W. and Zhu, Z. (2021). A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery. Ann. Statist. 49 1239-1266. 10.1214/20-aos1980 zbMATH: 1479.62034 MathSciNet: MR4298863 · Zbl 1479.62034
[14] Goldberg, D., Nichols, D., Oki, B.M. and Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Commun. ACM 35 61-70.
[15] Gross, D., Liu, Y.-K., Flammia, S.T., Becker, S. and Eisert, J. (2010). Quantum state tomography via compressed sensing. Phys. Rev. Lett. 105 150401. 10.1103/PhysRevLett.105.150401
[16] Huber, P.J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist. 1 799-821. MathSciNet: MR0356373 · Zbl 0289.62033
[17] Izenman, A.J. (1975). Reduced-rank regression for the multivariate linear model. J. Multivariate Anal. 5 248-264. 10.1016/0047-259X(75)90042-1 MathSciNet: MR0373179 · Zbl 0313.62042
[18] Klopp, O. (2014). Noisy low-rank matrix completion with general sampling distribution. Bernoulli 20 282-303. 10.3150/12-BEJ486 MathSciNet: MR3160583 · Zbl 1400.62115
[19] Klopp, O., Lounici, K. and Tsybakov, A.B. (2017). Robust matrix completion. Probab. Theory Related Fields 169 523-564. 10.1007/s00440-016-0736-y MathSciNet: MR3704775 · Zbl 1383.62167
[20] Koltchinskii, V., Lounici, K. and Tsybakov, A.B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302-2329. 10.1214/11-AOS894 MathSciNet: MR2906869 · Zbl 1231.62097
[21] Li, X. (2013). Compressed sensing and matrix completion with constant proportion of corruptions. Constr. Approx. 37 73-99. 10.1007/s00365-012-9176-9 MathSciNet: MR3010211 · Zbl 1258.93076
[22] Li, Y., Ma, T. and Zhang, H. (2018). Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In Conference on Learning Theory 2-47.
[23] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A.B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164-2204. 10.1214/11-AOS896 MathSciNet: MR2893865 · Zbl 1306.62156
[24] Luan, X., Fang, B., Liu, L., Yang, W. and Qian, J. (2014). Extracting sparse error of robust PCA for face recognition in the presence of varying illumination and occlusion. Pattern Recognit. 47 495-508.
[25] Ma, C., Wang, K., Chi, Y. and Chen, Y. (2020). Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution. Found. Comput. Math. 20 451-632. 10.1007/s10208-019-09429-9 MathSciNet: MR4099988 · Zbl 1445.90089
[26] Ma, J. and Fattahi, S. (2023). Global convergence of sub-gradient method for robust matrix recovery: Small initialization, noisy measurements, and over-parameterization. J. Mach. Learn. Res. 24 Paper No. [96], 84. MathSciNet: MR4582518
[27] Minsker, S. (2018). Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. Ann. Statist. 46 2871-2903. 10.1214/17-AOS1642 MathSciNet: MR3851758 · Zbl 1418.62235
[28] Negahban, S. and Wainwright, M.J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069-1097. 10.1214/10-AOS850 MathSciNet: MR2816348 · Zbl 1216.62090
[29] Negahban, S. and Wainwright, M.J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665-1697. MathSciNet: MR2930649 · Zbl 1436.62204
[30] Recht, B., Fazel, M. and Parrilo, P.A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471-501. 10.1137/070697835 MathSciNet: MR2680543 · Zbl 1198.90321
[31] Rohde, A. and Tsybakov, A.B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887-930. 10.1214/10-AOS860 MathSciNet: MR2816342 · Zbl 1215.62056
[32] She, Y. and Chen, K. (2017). Robust reduced-rank regression. Biometrika 104 633-647. 10.1093/biomet/asx032 MathSciNet: MR3694587 · Zbl 07072232
[33] Shen, Y., Li, J., Cai, J. and Xia, D. (2022). Computationally efficient and statistically optimal robust low-rank matrix estimation. arXiv:2203.00953.
[34] Sun, Q., Zhou, W.-X. and Fan, J. (2020). Adaptive Huber regression. J. Amer. Statist. Assoc. 115 254-265. 10.1080/01621459.2018.1543124 MathSciNet: MR4078461 · Zbl 1437.62250
[35] Tan, K.M., Sun, Q. and Witten, D. (2022). Sparse reduced rank Huber regression in high dimensions. J. Amer. Statist. Assoc. To appear.
[36] Thompson, P. (2020). Outlier-robust sparse/low-rank least-squares regression and robust matrix completion. arXiv:2012.06750.
[37] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. Digital Object Identifier: 10.1111/j.2517-6161.1996.tb02080.x Google Scholar: Lookup Link MathSciNet: MR1379242 zbMATH: 0850.62538 JSTOR: jstor.org · Zbl 0850.62538 · doi:10.1111/j.2517-6161.1996.tb02080.x
[38] Tong, T., Ma, C. and Chi, Y. (2021). Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent. J. Mach. Learn. Res. 22 Paper No. 150, 63. 10.1080/15502287.2020.1856971 MathSciNet: MR4318506 · Zbl 07415093
[39] Trefethen, L.N. and Bau, D. III (1997). Numerical Linear Algebra. Philadelphia, PA: SIAM. 10.1137/1.9780898719574 MathSciNet: MR1444820 · Zbl 0874.65013
[40] Wang, B. and Fan, J. (2022). Robust matrix completion with heavy-tailed noise. arXiv:2206.04276.
[41] Wei, K., Cai, J.-F., Chan, T.F. and Leung, S. (2016). Guarantees of Riemannian optimization for low rank matrix recovery. SIAM J. Matrix Anal. Appl. 37 1198-1222. 10.1137/15M1050525 MathSciNet: MR3543156 · Zbl 1347.65109
[42] Yu, M., Sun, Q. and Zhou, W.-X. (2024). Supplement to “Low-rank matrix recovery under heavy-tailed errors.” 10.3150/23-BEJ1675SUPP
[43] Zhang, J., Fattahi, S. and Zhang, R. (2021). Preconditioned gradient descent for over-parameterized nonconvex matrix factorization. In Advances in Neural Information Processing Systems 34 5985-5996.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.