×

Robust matrix estimations meet Frank-Wolfe algorithm. (English) Zbl 07730690

Summary: We consider estimating matrix-valued model parameters with a dedicated focus on their robustness. Our setting concerns large-scale structured data so that a regularization on the matrix’s rank becomes indispensable. Though robust loss functions are expected to be effective, their practical implementations are known difficult due to the non-smooth criterion functions encountered in the optimizations. To meet the challenges, we develop a highly efficient computing scheme taking advantage of the projection-free Frank-Wolfe algorithms that require only the first-order derivative of the criterion function. Our methodological framework is broad, extensively accommodating robust loss functions in conjunction with penalty functions in the context of matrix estimation problems. We establish the non-asymptotic error bounds of the matrix estimations with the Huber loss and nuclear norm penalty in two concrete cases: matrix completion with partial and noisy observations and reduced-rank regressions. Our theory demonstrates the merits from using robust loss functions, so that matrix-valued estimators with good properties are achieved even when heavy-tailed distributions are involved. We illustrate the promising performance of our methods with extensive numerical examples and data analysis.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Agarwal, A., Negahban, S., & Wainwright, M. J. (2012). Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, 1171-1197. · Zbl 1274.62219
[2] Anderson, T. W. (2003). An introduction to multivariate statistical analysis. New York: Wiley, 3rd edition. · Zbl 1039.62044
[3] Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press. · Zbl 1279.60005
[4] Bousquet, O., A Bennett concentration inequality and its application to suprema of empirical processes, Comptes Rendus Mathematique, 334, 6, 495-500 (2002) · Zbl 1001.60021 · doi:10.1016/S1631-073X(02)02292-6
[5] Candès, EJ; Tao, T., The power of convex relaxation: Near-optimal matrix completion, IEEE Transactions on Information Theory, 56, 5, 2053-2080 (2010) · Zbl 1366.15021 · doi:10.1109/TIT.2010.2044061
[6] Charisopoulos, V., Chen, Y., Davis, D., Díaz, M., Ding, L., & Drusvyatskiy, D. (2021). Low-rank matrix recovery with composite optimization: Good conditioning and rapid convergence. Foundations of Computational Mathematics, 1-89. · Zbl 1480.65100
[7] Chen, C.; He, B.; Yuan, X., Matrix completion via an alternating direction method, IMA Journal of Numerical Analysis, 32, 1, 227-245 (2012) · Zbl 1236.65043 · doi:10.1093/imanum/drq039
[8] Cook, RD, Regression graphics: Ideas for studying regressions through graphics, (2009), Hoboken: John Wiley & Sons, Hoboken · Zbl 0903.62001
[9] Elsener, A.; van de Geer, S., Robust low-rank matrix estimation, Annals of Statistics, 46, 6, 3481-3509 (2018) · Zbl 1412.62068 · doi:10.1214/17-AOS1666
[10] Fan, J.; Wang, W.; Zhu, Z., A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery, Annals of Statistics, 49, 3, 1239 (2021) · Zbl 1479.62034 · doi:10.1214/20-AOS1980
[11] Freund, RM; Grigas, P., New analysis and results for the Frank-Wolfe method, Mathematical Programming, 155, 1-2, 199-230 (2016) · Zbl 1342.90101 · doi:10.1007/s10107-014-0841-6
[12] Freund, RM; Grigas, P.; Mazumder, R., An extended Frank-Wolfe method with “in-face” directions, and its application to low-rank matrix completion, SIAM Journal on Optimization, 27, 1, 319-346 (2017) · Zbl 1357.90115 · doi:10.1137/15M104726X
[13] Hampel, FR; Ronchetti, EM; Rousseeuw, PJ; Stahel, WA, Robust statistics: The approach based on influence functions (2011), Hoboken: John Wiley & Sons, Hoboken · Zbl 0593.62027
[14] Huber, PJ, Robust statistics (2004), Hoboken: John Wiley & Sons, Hoboken
[15] Jaggi, M. (2013). Revisiting frank-wolfe: Projection-free sparse convex optimization. In International conference on machine learning (pp. 427-435). PMLR.
[16] Kerdreux, T., d’Aspremont, A., & Pokutta, S. (2018). Restarting frank-wolfe. arXiv preprint arXiv:1810.02429.
[17] Klopp, O., Noisy low-rank matrix completion with general sampling distribution, Bernoulli, 20, 1, 282-303 (2014) · Zbl 1400.62115 · doi:10.3150/12-BEJ486
[18] Koltchinskii, V.; Lounici, K.; Tsybakov, AB, Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion, The Annals of Statistics, 39, 5, 2302-2329 (2011) · Zbl 1231.62097 · doi:10.1214/11-AOS894
[19] Lacoste-Julien, S. & Jaggi, M. (2015). On the global linear convergence of Frank-Wolfe optimization variants. arXiv preprint arXiv:1511.05932.
[20] Lauritzen, SL, Graphical models (1996), Oxford: Clarendon Press, Oxford · Zbl 0907.62001
[21] Ledoux, M., & Talagrand, M. (2013). Probability in Banach spaces: Isoperimetry and processes. Springer, Berlin. · Zbl 1226.60003
[22] Loh, P-L, Statistical consistency and asymptotic normality for high-dimensional robust \(m\)-estimators, The Annals of Statistics, 45, 2, 866-896 (2017) · Zbl 1371.62023 · doi:10.1214/16-AOS1471
[23] Negahban, S., & Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics, 1069-1097. · Zbl 1216.62090
[24] Negahban, S.; Wainwright, MJ, Restricted strong convexity and weighted matrix completion: Optimal bounds with noise, The Journal of Machine Learning Research, 13, 1, 1665-1697 (2012) · Zbl 1436.62204
[25] Negahban, SN; Ravikumar, P.; Wainwright, MJ; Yu, B., A unified framework for high-dimensional analysis of \(m\)-estimators with decomposable regularizers, Statistical Science, 27, 4, 538-557 (2012) · Zbl 1331.62350 · doi:10.1214/12-STS400
[26] Recht, B. (2011). A simpler approach to matrix completion. Journal of Machine Learning Research, 12(12). · Zbl 1280.68141
[27] Reddi, S. J., Hefny, A., Sra, S., Poczos, B., & Smola, A. (2016). Stochastic variance reduction for nonconvex optimization. In International conference on machine learning (pp. 314-323).
[28] Reinsel, GC; Velu, R., Multivariate reduced rank regression (1998), Berlin: Springer, Berlin · Zbl 0909.62066 · doi:10.1007/978-1-4757-2853-8
[29] Rohde, A.; Tsybakov, AB, Estimation of high-dimensional low-rank matrices, The Annals of Statistics, 39, 2, 887-930 (2011) · Zbl 1215.62056 · doi:10.1214/10-AOS860
[30] She, Y.; Chen, K., Robust reduced-rank regression, Biometrika, 104, 3, 633-647 (2017) · Zbl 07072232 · doi:10.1093/biomet/asx032
[31] Sun, Q.; Zhou, W-X; Fan, J., Adaptive Huber regression, Journal of the American Statistical Association, 115, 529, 254-265 (2020) · Zbl 1437.62250 · doi:10.1080/01621459.2018.1543124
[32] Swoboda, P., & Kolmogorov, V. (2019). Map inference via block-coordinate Frank-Wolfe algorithm. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 11146-11155).
[33] Toh, K-C; Yun, S., An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems, Pacific Journal of Optimization, 6, 615-640, 15 (2010) · Zbl 1205.90218
[34] Wong, RK; Lee, TC, Matrix completion with noisy entries and outliers, The Journal of Machine Learning Research, 18, 1, 5404-5428 (2017) · Zbl 1443.15017
[35] Zhou, W.-X., Bose, K., Fan, J., & Liu, H. (2018). A new perspective on robust m-estimation: Finite sample theory and applications to dependence-adjusted multiple testing. The Annals of Statistics,46(5), 1904-1931. · Zbl 1409.62154
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.