×

Mixed precision \(s\)-step Lanczos and conjugate gradient algorithms. (English) Zbl 1538.65062

The paper is devoted to the numerical behaviour of the \(s\)-step Lanczos algorithm and the \(s\)-step variant of the conjugate gradient (CG) method. These schemes have a potential to improve the parallel performance of the classical formulations of Lanczos and CG that are often limited by communication. However, there is a trade-off between potential parallelism and numerical stability of these schemes. Indeed, the reduction of the synchronization cost per iteration comes with a price that the \(s\)-step formulations behave quite differently in finite precision arithmetic and potentially lead to faster loss of orthogonality and slower convergence rate when compared to classical but more sequential algorithms. In this work the authors consider the multi-precision environment that has now become a crucial part of supercomputer parallel architectures. They investigate whether the use of multiple precision in the computation can improve the numerical behaviour of \(s\)-step Lanczos and CG algorithms without a significant performance overhead. This is achieved by doubling the working accuracy in computation of entries and in matrix-vector multiplication by the associated Gram matrices at each outer loop iteration, while all other computations are performed with the standard precision. The rounding error analysis in the paper leads to an improvement of existing bounds; the error terms depend here only linearly on the conditioning of the generated \(s\)-step bases. Numerical experiments also demonstrate that mixed precision approach leads to the improved numerical behaviour for both \(s\)-step Lanczos and CG algorithms.
Reviewer: Miroslav Rozloznik

MSC:

65F10 Iterative numerical methods for linear systems
65F15 Numerical computation of eigenvalues and eigenvectors of matrices
65G50 Roundoff error
65Y05 Parallel numerical computation
65Y10 Numerical algorithms for specific classes of architectures

References:

[1] LanczosC. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J Res Nat Bur Stand. 1950;45(4):255-82.
[2] LanczosC. Solution of systems of linear equations by minimized iterations. J Res Nat Bur Stand. 1952;49:33-53.
[3] HestenesM, StiefelE. Methods of conjugate gradients for solving linear systems. J Nat Bur Stand. 1952;49:409-36. · Zbl 0048.09901
[4] MeurantG, StrakošZ. The Lanczos and conjugate gradient algorithms in finite precision arithmetic. Acta Numer. 2006;15:471-542. · Zbl 1113.65032
[5] TOP500. List; 2020. Available from: https://www.top500.org/lists/top500/.
[6] ChronopoulosA, GearC. s‐step iterative methods for symmetric linear systems. J Comput Appl Math. 1989;25(2):153-68. · Zbl 0669.65021
[7] HoemmenM. Communication‐avoiding Krylov subspace methods. Berkeley: EECS Department, University of California; 2010.
[8] CarsonE. Communication‐avoiding Krylov subspace methods in theory and practice. Berkeley: EECS Department, University of California; 2015.
[9] DemmelJ, HoemmenM, MohiyuddinM, YelickK. Avoiding communication in sparse matrix computations. Proceedings of the IEEE International Symposium on Parallel and Distributed Processing; 2008. p. 1-12.
[10] BallardG, CarsonE, DemmelJ, HoemmenM, KnightN, SchwartzO. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numer. 2014;5(23):1-155. · Zbl 1396.65082
[11] GhyselsP, VanrooseW. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. Parallel Comput. 2014;40(7):224-38.
[12] YamazakiI, HoemmenM, LuszczekP, DongarraJ. Improving performance of GMRES by reducing communication and pipelining global collectives. Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW); 2017. p. 1118-27.
[13] JoubertW, CareyG. Parallelizable restarted iterative methods for nonsymmetric linear systems. Part I: theory. Int J Comput Math. 1992;44(1‐4):243-67. · Zbl 0759.65008
[14] BaiZ, HuD, ReichelL. A Newton basis GMRES implementation. IMA J Numer Anal. 1994;14(4):563-81. · Zbl 0818.65022
[15] CarsonE, DemmelJ. A residual replacement strategy for improving the maximum attainable accuracy of s‐step Krylov subspace methods. SIAM J Matrix Anal Appl. 2014;35(1):22-43. · Zbl 1302.65075
[16] CarsonEC. The adaptive s‐step conjugate gradient method. SIAM J Matrix Anal Appl. 2018;39(3):1318-38. · Zbl 1398.65044
[17] CarsonEC. An adaptive s‐step conjugate gradient algorithm with dynamic basis updating. Appl Math. 2020;65:123-51. · Zbl 1538.65096
[18] AbdelfattahA, AnztH, BomanEG, CarsonE, CojeanT, DongarraJ, GatesM, GrützmacherT, HighamNJ, LiS, LindquistN. A survey of numerical methods utilizing mixed precision arithmetic; 2020. arXiv preprint arXiv:200706674.
[19] PaigeC. Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix. IMA J Appl Math.1976;18(3):341-9. · Zbl 0347.65018
[20] PaigeC. Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem. Linear Algebra Appl. 1980;34:235-58. · Zbl 0471.65017
[21] CarsonE, DemmelJ. Accuracy of the \(s\)‐step lanczos method for the symmetric eigenproblem in finite precision. SIAM J Matrix Anal Appl. 2015;36(2):793-819. · Zbl 1319.65024
[22] BaileyDH. A fortran‐90 double‐double library. Available from: http://www.nersc.gov/∼dhbailey/mpdist/mpdist.html.
[23] GreenbaumA. Behavior of slightly perturbed Lanczos and conjugate‐gradient recurrences. Lin Alg Appl. 1989;113:7-63. · Zbl 0662.65032
[24] YamazakiI, TomovS, DongT, DongarraJ. Mixed‐precision orthogonalization scheme and adaptive step size for improving the stability and performance of CA‐GMRES on GPUs. Proceedings of the International Conference on High Performance Computing for Computational Science; 2014. p. 17-30; Springer. · Zbl 07631077
[25] CarsonE, GergelitsT. Mixed precision \(s\)‐step Lanczos and conjugate gradient algorithms; 2021. arXiv preprint arXiv:210309210.
[26] GolubG, Van LoanC. Matrix computations. 3rd ed.Baltimore, MD: Johns Hopkins University Press; 1996. · Zbl 0865.65009
[27] WilkinsonJH. Error analysis revisited. IMA Bull. 1986;22(11/12):192-200. · Zbl 0628.65035
[28] WilkinsonJH. The algebraic eigenvalue problem. Oxford, UK: Oxford University Press; 1965. · Zbl 0258.65037
[29] CarsonE, DemmelJ. Accuracy of the s‐step Lanczos method for the symmetric eigenproblem. UCB/EECS‐2014‐165. Berkeley: EECS Department, University of California; 2014.
[30] DavisT, HuY. The University of Florida sparse matrix collection. ACM Trans Math Software.2011;38(1):1-25. · Zbl 1365.65123
[31] Advanpix, LLC. Multiprecision computing toolbox for MATLAB; 2006. Available from: http://www.advanpix.com/.
[32] EdwardsHC, TrottCR, SunderlandD. Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J Parallel Distrib Comput. 2014;74(12):3202-16.
[33] HidaY, LiXS, BaileyDH. Algorithms for quad‐double precision floating point arithmetic. Proceedings of the 15th IEEE Symposium on Computer Arithmetic. ARITH‐15; 2001. p. 155-62; IEEE.
[34] Kokkos: the C++ performance portability programming model. programming guide: custom reductions. Available from: https://github.com/kokkos/kokkos/wiki/Programming‐Guide
[35] NakataM.The MPACK (MBLAS/MLAPACK); a multiple precision arithmetic version of BLAS and LAPACK; 2010. Available from: http://mplapack.sourceforge.net/.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.