×

On the asymptotic linear convergence speed of Anderson acceleration, Nesterov acceleration, and nonlinear GMRES. (English) Zbl 1490.65120

Summary: We consider nonlinear convergence acceleration methods for fixed-point iteration \(x_{k+1}=q(x_k)\), including Anderson acceleration (AA), nonlinear GMRES (NGMRES), and Nesterov-type acceleration (corresponding to AA with window size one). We focus on fixed-point methods that converge asymptotically linearly with convergence factor \(\rho<1\) and that solve an underlying fully smooth and nonconvex optimization problem. It is often observed that AA and NGMRES substantially improve the asymptotic convergence behavior of the fixed-point iteration, but this improvement has not been quantified theoretically. We investigate this problem under simplified conditions. First, we consider stationary versions of AA and NGMRES, and determine coefficients that result in optimal asymptotic convergence factors, given knowledge of the spectrum of \(q'(x)\) at the fixed point \(x^*\). This allows us to understand and quantify the asymptotic convergence improvement that can be provided by nonlinear convergence acceleration, viewing \(x_{k+1}=q(x_k)\) as a nonlinear preconditioner for AA and NGMRES. Second, for the case of infinite window size, we consider linear asymptotic convergence bounds for GMRES applied to the fixed-point iteration linearized about \(x^*\). Since AA and NGMRES are equivalent to GMRES in the linear case, one may expect the GMRES convergence factors to be relevant for AA and NGMRES as \(x_k \rightarrow x^*\). Our results are illustrated numerically for a class of test problems from canonical tensor decomposition, comparing steepest descent and alternating least squares as the fixed-point iterations that are accelerated by AA and NGMRES. Our numerical tests show that both approaches allow us to estimate asymptotic convergence speed for nonstationary AA and NGMRES with finite window size.

MSC:

65K10 Numerical optimization and variational techniques
49M37 Numerical methods based on nonlinear programming
65H10 Numerical computation of solutions to systems of equations
65F08 Preconditioners for iterative methods
65F10 Iterative numerical methods for linear systems
15A69 Multilinear algebra, tensor calculus

References:

[1] E. Acar, D. M. Dunlavy, and T. G. Kolda, A scalable optimization approach for fitting canonical tensor decompositions, J. Chemom., 25 (2011), pp. 67-86.
[2] E. Acar, T. G. Kolda, and D. M. Dunlavy, An Optimization Approach for Fitting Canonical Tensor Decompositions, Tech. Rep. SAND2009-0857, Sandia National Laboratories, Albuquerque, NM, Livermore, CA, 2009.
[3] D. G. Anderson, Iterative procedures for nonlinear integral equations, J. Assoc. Comput. Mach., 12 (1965), pp. 547-560. · Zbl 0149.11503
[4] A. M. S. Ang and N. Gillis, Accelerating nonnegative matrix factorization algorithms using extrapolation, Neural Comput., 31 (2019), pp. 417-439. · Zbl 1470.65083
[5] B. W. Bader, T. G. Kolda, et al., MATLAB Tensor Toolbox, http://www.sandia.gov/ tgkolda/TensorToolbox/, 2015.
[6] B. Beckermann, S. A. Goreinov, and E. E. Tyrtyshnikov, Some remarks on the Elman estimate for GMRES, SIAM J. Matrix Anal. Appl., 27 (2006), pp. 772-778, https://doi.org/10.1137/040618849. · Zbl 1101.65032
[7] P. R. Brune, M. G. Knepley, B. F. Smith, and X. Tu, Composing scalable nonlinear algebraic solvers, SIAM Rev., 57 (2015), pp. 535-565, https://doi.org/10.1137/130936725. · Zbl 1336.65030
[8] H. De Sterck, A nonlinear GMRES optimization algorithm for canonical tensor decomposition, SIAM J. Sci. Comput., 34 (2012), pp. A1351-A1379, https://doi.org/10.1137/110835530. · Zbl 1253.15035
[9] H. De Sterck, Steepest descent preconditioning for nonlinear GMRES optimization, Numer. Linear Algebra Appl., 20 (2013), pp. 453-471. · Zbl 1313.65137
[10] H. De Sterck and A. Howse, Nonlinearly preconditioned optimization on Grassmann manifolds for computing approximate Tucker tensor decompositions, SIAM J. Sci. Comput., 38 (2016), pp. A997-A1018, https://doi.org/10.1137/15M1037288. · Zbl 1382.65183
[11] H. De Sterck and A. J. Howse, Nonlinearly preconditioned L-BFGS as an acceleration mechanism for alternating least squares with application to tensor decomposition, Numer. Linear Algebra Appl., 25 (2018), e2202. · Zbl 1513.65179
[12] H. De Sterck and K. Miller, An adaptive algebraic multigrid algorithm for low-rank canonical tensor decomposition, SIAM J. Sci. Comput., 35 (2013), pp. B1-B24, https://doi.org/10.1137/110855934. · Zbl 1264.65045
[13] H. De Sterck and M. Winlaw, A nonlinearly preconditioned conjugate gradient algorithm for rank-R canonical tensor approximation, Numer. Linear Algebra Appl., 22 (2015), pp. 410-432. · Zbl 1363.65041
[14] T. A. Driscoll, N. Hale, and L. N. Trefethen, eds., Chebfun Guide, Pafnuty Publications, Oxford, UK, 2014.
[15] D. M. Dunlavy, T. G. Kolda, and E. Acar, Poblano v1.0: A Matlab Toolbox for Gradient-Based Optimization, Tech. Rep. SAND2010-1422, Sandia National Laboratories, Albuquerque, NM, Livermore, CA, 2010.
[16] C. Evans, S. Pollock, L. G. Rebholz, and M. Xiao, A proof that Anderson acceleration improves the convergence rate in linearly converging fixed-point methods (but not in those converging quadratically), SIAM J. Numer. Anal., 58 (2020), pp. 788-810, https://doi.org/10.1137/19M1245384. · Zbl 1433.65102
[17] H.-r. Fang and Y. Saad, Two classes of multisecant methods for nonlinear acceleration, Numer. Linear Algebra Appl., 16 (2009), pp. 197-221. · Zbl 1224.65134
[18] A. Greenbaum, V. Pták, and Z. Strakos̆, Any nonincreasing convergence curve is possible for GMRES, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 465-469, https://doi.org/10.1137/S0895479894275030. · Zbl 0857.65029
[19] M. H. Gutknecht, W. Niethammer, and R. S. Varga, \(k\)-step iterative methods for solving nonlinear systems of equations, Numer. Math., 48 (1986), pp. 699-712. · Zbl 0597.65047
[20] T. Hong, I. Yavneh, and M. Zibulevsky, Accelerating Multigrid Optimization via SESOP, preprint, https://arxiv.org/abs/1812.06896, 2018.
[21] T. Kerkhoven and Y. Saad, On acceleration methods for coupled nonlinear elliptic systems, Numer. Math., 60 (1992), pp. 525-548. · Zbl 0724.65095
[22] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455-500, https://doi.org/10.1137/07070111X. · Zbl 1173.65029
[23] L. Lessard, B. Recht, and A. Packard, Analysis and design of optimization algorithms via integral quadratic constraints, SIAM J. Optim., 26 (2016), pp. 57-95, https://doi.org/10.1137/15M1009597. · Zbl 1329.90103
[24] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, 3rd ed., Internat. Ser. Oper. Res. Management Sci. 116, Springer, New York, 2008. · Zbl 1207.90003
[25] A. Mees and D. Atherton, Domains containing the field of values of a matrix, Linear Algebra Appl., 26 (1979), pp. 289-296. · Zbl 0407.15016
[26] D. Mitchell, N. Ye, and H. De Sterck, Nesterov acceleration of alternating least squares for canonical tensor decomposition: Momentum step size selection and restart mechanisms, Numer. Linear Algebra Appl., 27 (2020), e2297. · Zbl 1488.65108
[27] J. J. Moré and D. J. Thuente, Line search algorithms with guaranteed sufficient decrease, ACM Trans. Math. Software, 20 (1994), pp. 286-307. · Zbl 0888.65072
[28] Y. Nesterov, A method of solving a convex programming problem with convergence rate \({O}(1/k^2)\), Soviet Math. Dokl., 27 (1983), pp. 372-376. · Zbl 0535.90071
[29] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Appl. Optim. 87, Kluwer Academic Publishers, Boston, MA, 2004. · Zbl 1086.90045
[30] W. Niethammer and R. S. Varga, The analysis of \(k\)-step iterative methods for linear systems from summability theory, Numer. Math., 41 (1983), pp. 177-206. · Zbl 0487.65018
[31] B. O’Donoghue and E. Candès, Adaptive restart for accelerated gradient schemes, Found. Comput. Math., 15 (2015), pp. 715-732. · Zbl 1320.90061
[32] C. W. Oosterlee and T. Washio, Krylov subspace acceleration of nonlinear multigrid with application to recirculating flows, SIAM J. Sci. Comput., 21 (2000), pp. 1670-1690, https://doi.org/10.1137/S1064827598338093. · Zbl 0968.76061
[33] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Classics Appl. Math. 30, SIAM, Philadelphia, 2000, https://doi.org/10.1137/1.9780898719468. · Zbl 0949.65053
[34] D. Scieur, A. d’Aspremont, and F. Bach, Regularized nonlinear acceleration, in Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 2016, pp. 712-720.
[35] A. Toth and C. T. Kelley, Convergence analysis for Anderson acceleration, SIAM J. Numer. Anal., 53 (2015), pp. 805-819, https://doi.org/10.1137/130919398. · Zbl 1312.65083
[36] L. N. Trefethen and D. Bau III, Numerical Linear Algebra, SIAM, Philadelphia, 1997. · Zbl 0874.65013
[37] L. N. Trefethen and M. Embree, Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators, Princeton University Press, Princeton, NJ, 2005. · Zbl 1085.15009
[38] A. Uschmajew, Local convergence of the alternating least squares algorithm for canonical tensor approximation, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 639-652, https://doi.org/10.1137/110843587. · Zbl 1252.65085
[39] H. F. Walker and P. Ni, Anderson acceleration for fixed-point iterations, SIAM J. Numer. Anal., 49 (2011), pp. 1715-1735, https://doi.org/10.1137/10078356X. · Zbl 1254.65067
[40] D. Wang, Y. He, and H. De Sterck, Quantifying the Asymptotic Linear Convergence Speed of Anderson Acceleration Applied to ADMM, preprint, https://arxiv.org/abs/2007.02916v2, 2020.
[41] T. Washio and C. W. Oosterlee, Krylov subspace acceleration for nonlinear multigrid schemes, Electron. Trans. Numer. Anal., 6 (1997), pp. 271-290. · Zbl 0903.65096
[42] J. Zhang, Y. Peng, W. Ouyang, and B. Deng, Accelerating ADMM for efficient simulation and optimization, ACM Trans. Graph., 38 (2019), pp. 1-21.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.