×

Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent. (English) Zbl 07415093

Summary: Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and then optimize these factors directly via simple iterative methods such as gradient descent and alternating minimization. Despite nonconvexity, recent literatures have shown that these simple heuristics in fact achieve linear convergence when initialized properly for a growing number of problems of interest. However, upon closer examination, existing approaches can still be computationally expensive especially for ill-conditioned matrices: the convergence rate of gradient descent depends linearly on the condition number of the low-rank matrix, while the per-iteration cost of alternating minimization is often prohibitive for large matrices.
The goal of this paper is to set forth a competitive algorithmic approach dubbed Scaled Gradient Descent (ScaledGD) which can be viewed as preconditioned or diagonally-scaled gradient descent, where the preconditioners are adaptive and iteration-varying with a minimal computational overhead. With tailored variants for low-rank matrix sensing, robust principal component analysis and matrix completion, we theoretically show that ScaledGD achieves the best of both worlds: it converges linearly at a rate independent of the condition number of the low-rank matrix similar as alternating minimization, while maintaining the low per-iteration cost of gradient descent. Our analysis is also applicable to general loss functions that are restricted strongly convex and smooth over low-rank matrices. To the best of our knowledge, ScaledGD is the first algorithm that provably has such properties over a wide range of low-rank matrix estimation tasks. At the core of our analysis is the introduction of a new distance function that takes account of the preconditioners when measuring the distance between the iterates and the ground truth. Finally, numerical examples are provided to demonstrate the effectiveness of ScaledGD in accelerating the convergence rate of ill-conditioned low-rank matrix estimation in a wide number of applications.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

Wirtinger Flow

References:

[1] Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53-58, 1989.
[2] Srinadh Bhojanapalli, Anastasios Kyrillidis, and Sujay Sanghavi. Dropping convexity for faster semi-definite optimization. InConference on Learning Theory, pages 530-582. PMLR, 2016a.
[3] Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Srebro. Global optimality of local search for low rank matrix recovery. InAdvances in Neural Information Processing Systems, pages 3873-3881, 2016b.
[4] Jian-Feng Cai, Tianming Wang, and Ke Wei. Spectral compressed sensing via projected gradient descent.SIAM Journal on Optimization, 28(3):2625-2653, 2018. · Zbl 1447.94008
[5] Emmanuel Candès, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms.Information Theory, IEEE Transactions on, 61(4):1985-2007, 2015. · Zbl 1359.94069
[6] Emmanuel J Candès and Yaniv Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57 (4):2342-2359, 2011. · Zbl 1366.90160
[7] Emmanuel J. Candès and Benjamin Recht.Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717-772, 2009. · Zbl 1219.90124
[8] Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM, 58(3):11:1-11:37, 2011. · Zbl 1327.62369
[9] Venkat Chandrasekaran, Sujay Sanghavi, Pablo Parrilo, and Alan Willsky. Rank-sparsity incoherence for matrix decomposition.SIAM Journal on Optimization, 21(2):572-596, 2011. · Zbl 1226.90067
[10] Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo Díaz, Lijun Ding, and Dmitriy Drusvyatskiy. Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence.Foundations of Computational Mathematics, pages 1-89, 2021.
[11] Ji Chen and Xiaodong Li. Model-free nonconvex matrix completion: Local minima analysis and applications in memory-efficient kernel PCA.Journal of Machine Learning Research, 20(142): 1-39, 2019. · Zbl 1441.62157
[12] Ji Chen, Dekai Liu, and Xiaodong Li. Nonconvex rectangular matrix completion via gradient descent without‘2,∞regularization.IEEE Transactions on Information Theory, 66(9):5806-5841, 2020a. · Zbl 1448.90078
[13] Yudong Chen. Incoherence-optimal matrix completion.IEEE Transactions on Information Theory, 61(5):2909-2923, 2015. · Zbl 1359.15022
[14] Yudong Chen and Yuejie Chi. Harnessing structures in big data via guaranteed low-rank matrix estimation: Recent theory and fast algorithms via convex and nonconvex optimization.IEEE Signal Processing Magazine, 35(4):14 - 31, 2018.
[15] Yudong Chen and Martin J Wainwright. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees.arXiv preprint arXiv:1509.03025, 2015.
[16] Yuxin Chen and Yuejie Chi. Robust spectral compressed sensing via structured matrix completion. IEEE Transactions on Information Theory, 60(10):6576-6601, 2014. · Zbl 1360.94064
[17] Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, and Yuling Yan. Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization.SIAM Journal on Optimization, 30(4):3098-3121, 2020b. · Zbl 1477.90060
[18] Yuxin Chen, Jianqing Fan, Cong Ma, and Yuling Yan. Bridging convex and nonconvex optimization in robust PCA: Noise, outliers, and missing data.arXiv preprint arXiv:2001.05484, 2020c. · Zbl 1477.90060
[19] Yuejie Chi, Yue M Lu, and Yuxin Chen. Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Transactions on Signal Processing, 67(20):5239-5269, 2019. · Zbl 1543.90234
[20] Damek Davis, Dmitriy Drusvyatskiy, and Courtney Paquette. The nonsmooth landscape of phase retrieval.arXiv preprint arXiv:1711.03247, 2017. · Zbl 1464.65063
[21] Simon S Du, Wei Hu, and Jason D Lee. Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced. InAdvances in Neural Information Processing Systems, pages 384-395, 2018.
[22] Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. Escaping from saddle points-online stochastic gradient for tensor decomposition. InConference on Learning Theory (COLT), pages 797-842, 2015.
[23] Rong Ge, Jason D Lee, and Tengyu Ma. Matrix completion has no spurious local minimum. In Advances in Neural Information Processing Systems, pages 2973-2981, 2016.
[24] Rong Ge, Chi Jin, and Yi Zheng. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. InInternational Conference on Machine Learning, pages 1233-1242, 2017.
[25] Suriya Gunasekar, Pradeep Ravikumar, and Joydeep Ghosh. Exponential family matrix completion under structural constraints. InInternational Conference on Machine Learning, pages 1917-1925, 2014.
[26] Moritz Hardt and Mary Wootters.Fast matrix completion without the condition number.In Proceedings of The 27th Conference on Learning Theory, pages 638-678, 2014.
[27] Prateek Jain and Purushottam Kar. Non-convex optimization for machine learning.Foundations and TrendsRin Machine Learning, 10(3-4):142-336, 2017. · Zbl 1388.68251
[28] Prateek Jain, Raghu Meka, and Inderjit S Dhillon. Guaranteed rank minimization via singular value projection. InAdvances in Neural Information Processing Systems, pages 937-945, 2010.
[29] Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi.Low-rank matrix completion using alternating minimization. InProceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 665-674. ACM, 2013. · Zbl 1293.65073
[30] Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M Kakade, and Michael I Jordan. How to escape saddle points efficiently. InInternational Conference on Machine Learning, pages 1724-1732, 2017.
[31] Kenji Kawaguchi. Deep learning without poor local minima. InAdvances in neural information processing systems, pages 586-594, 2016.
[32] Anastasios Kyrillidis and Volkan Cevher. Matrix ALPS: Accelerated low rank and sparse matrix reconstruction. In2012 IEEE Statistical Signal Processing Workshop (SSP), pages 185-188. IEEE, 2012. · Zbl 1311.90141
[33] Jean Lafond. Low rank matrix completion with exponential family noise. InConference on Learning Theory, pages 1224-1243, 2015.
[34] Xiaodong Li, Shuyang Ling, Thomas Strohmer, and Ke Wei. Rapid, robust, and reliable blind deconvolution via nonconvex optimization.Applied and computational harmonic analysis, 47(3): 893-934, 2019. · Zbl 1422.94013
[35] Y. Li, C. Ma, Y. Chen, and Y. Chi. Nonconvex matrix factorization from rank-one measurements. IEEE Transactions on Information Theory, 67(3):1928-1950, 2021. · Zbl 1473.65050
[36] Yuetian Luo, Wen Huang, Xudong Li, and Anru R Zhang.Recursive importance sketching for rank constrained least squares: Algorithms and high-order convergence.arXiv preprint arXiv:2011.08360, 2020.
[37] Cong Ma, Kaizheng Wang, Yuejie Chi, and Yuxin Chen.Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution.Foundations of Computational Mathematics, pages 1-182, 2019. · Zbl 1445.90089
[38] Cong Ma, Yuanxin Li, and Yuejie Chi. Beyond Procrustes: Balancing-free gradient descent for asymmetric low-rank matrix sensing.IEEE Transactions on Signal Processing, 69:867-877, 2021. · Zbl 1543.65060
[39] Mantas Mazeika. The singular value decomposition and low rank approximation. Technical report, University of Chicago, 2016.
[40] Song Mei, Yu Bai, and Andrea Montanari. The landscape of empirical risk for nonconvex losses. The Annals of Statistics, 46(6A):2747-2774, 2018. · Zbl 1409.62117
[41] Bamdev Mishra and Rodolphe Sepulchre. Riemannian preconditioning.SIAM Journal on Optimization, 26(1):635-660, 2016. · Zbl 1382.65180
[42] Bamdev Mishra, K Adithya Apuroop, and Rodolphe Sepulchre. A Riemannian geometry for low-rank matrix completion.arXiv preprint arXiv:1211.1550, 2012.
[43] Yurii Nesterov and Boris T Polyak. Cubic regularization of Newton method and its global performance.Mathematical Programming, 108(1):177-205, 2006. · Zbl 1142.90500
[44] Praneeth Netrapalli, UN Niranjan, Sujay Sanghavi, Animashree Anandkumar, and Prateek Jain. Non-convex robust PCA. InAdvances in Neural Information Processing Systems, pages 1107- 1115, 2014.
[45] Dohyung Park, Anastasios Kyrillidis, Constantine Carmanis, and Sujay Sanghavi. Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach. InArtificial Intelligence and Statistics, pages 65-74, 2017.
[46] Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi. Finding lowrank solutions via nonconvex matrix factorization, efficiently and provably.SIAM Journal on Imaging Sciences, 11(4):2165-2204, 2018. · Zbl 1419.90065
[47] Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM review, 52(3):471-501, 2010. · Zbl 1198.90321
[48] Sujay Sanghavi, Rachel Ward, and Chris D White. The local convexity of solving systems of quadratic equations.Results in Mathematics, 71(3-4):569-608, 2017. · Zbl 1383.65064
[49] Ju Sun, Qing Qu, and John Wright. Complete dictionary recovery using nonconvex optimization. In Proceedings of the 32nd International Conference on Machine Learning, pages 2351-2360, 2015.
[50] Ju Sun, Qing Qu, and John Wright.A geometric analysis of phase retrieval.Foundations of Computational Mathematics, 18(5):1131-1198, 2018. · Zbl 1401.94049
[51] Ruoyu Sun and Zhi-Quan Luo. Guaranteed matrix completion via non-convex factorization.IEEE Transactions on Information Theory, 62(11):6535-6579, 2016. · Zbl 1359.94179
[52] Jared Tanner and Ke Wei. Low rank matrix completion by alternating steepest descent methods. Applied and Computational Harmonic Analysis, 40(2):417-429, 2016. · Zbl 1336.65047
[53] Tian Tong, Cong Ma, and Yuejie Chi. Low-rank matrix recovery with scaled subgradient methods: Fast and robust convergence without the condition number.IEEE Transactions on Signal Processing, 2021a. · Zbl 1543.65063
[54] Tian Tong, Cong Ma, Ashley Prater-Bennette, Erin Tripp, and Yuejie Chi. Scaling and scalability: Provable nonconvex low-rank tensor estimation from incomplete measurements.arXiv preprint arXiv:2104.14526, 2021b.
[55] Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi, and Benjamin Recht. Low-rank solutions of linear matrix equations via Procrustes flow. InInternational Conference Machine Learning, pages 964-973, 2016.
[56] Ke Wei, Jian-Feng Cai, Tony F Chan, and Shingyu Leung. Guarantees of Riemannian optimization for low rank matrix recovery.SIAM Journal on Matrix Analysis and Applications, 37(3):1198- 1222, 2016. · Zbl 1347.65109
[57] Xinyang Yi, Dohyung Park, Yudong Chen, and Constantine Caramanis. Fast algorithms for robust PCA via gradient descent. InAdvances in neural information processing systems, pages 4152-4160, 2016.
[58] Qinqing Zheng and John Lafferty. A convergent gradient descent algorithm for rank minimization and semidefinite programming from random linear measurements. InAdvances in Neural Information Processing Systems, pages 109-117, 2015.
[59] Qinqing Zheng and John Lafferty. Convergence analysis for rectangular matrix completion using Burer-Monteiro factorization and gradient descent.arXiv preprint arXiv:1605.07051, 2016.
[60] Zhihui Zhu, Qiuwei Li, Gongguo Tang, and Michael B Wakin. Global optimality in low-rank matrix optimization.IEEE Transactions on Signal Processing, 66(13):3614-3628, 2018. · Zbl 1414.90297
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.