×

AmgX: a library for GPU accelerated algebraic multigrid and preconditioned iterative methods. (English) Zbl 1325.65065

Summary: The solution of large sparse linear systems arises in many applications, such as computational fluid dynamics and oil reservoir simulation. In realistic cases the matrices are often so large that they require large scale distributed parallel computing to obtain the solution of interest in a reasonable time. In this paper we discuss the design and implementation of the AmgX library, which provides drop-in GPU acceleration of distributed algebraic multigrid (AMG) and preconditioned iterative methods. The AmgX library implements both classical and aggregation-based AMG methods with different selector and interpolation strategies, along with a variety of smoothers and preconditioners, including block-Jacobi, Gauss-Seidel, and incomplete-LU factorization. The library contains many of the standard and flexible preconditioned Krylov subspace iterative methods, which can be combined with any of the available multigrid methods or simpler preconditioners. The parallelism in the aggregation scheme exploits parallel graph matching techniques, while the smoothers and preconditioners often rely on parallel graph coloring algorithms. The AMG algorithm implemented in the AmgX library achieves 2–5\(\times\) speedup on a single GPU against a competitive implementation on the CPU. As will be shown in the numerical experiments section, both setup and solve phases scale well across multiple nodes, sustaining this performance advantage.

MSC:

65F50 Computational methods for sparse matrices
15-04 Software, source code, etc. for problems pertaining to linear algebra
65F08 Preconditioners for iterative methods
05C15 Coloring of graphs and hypergraphs

References:

[1] M. Adams, {\it A parallel maximal independent set algorithm}, in Proceedings of the 5th Copper Mountain Conference on Iterative Methods, 1998.
[2] M. Adams, M. Brezina, J. Hu, and R. Tuminaro, {\it Parallel multigrid smoothing: Polynomial versus Gauss-Seidel}, J. Comput. Phys., 188 (2003), pp. 593-610. · Zbl 1022.65030
[3] J. R. Allwright, R. Bordawekar, P. D. Coddington, K. Dincer, and C. L. Martin, {\it A Comparison of Parallel Graph Coloring Algorithms}, Technical report, Syracuse University, Syracuse, NY, 1995.
[4] R. Anderson and J. C. Setubal, {\it A parallel implementation of the push-relabel algorithm for the maximum flow problem}, J. Parallel Distrib. Comput., 29 (1995), pp. 17-26.
[5] ANSYS, {\it Fluent}, \burlhttp://www.ansys.com/Products/Simulation+Technology/Fluid+Dynamics/Fluid+Dynamics+Products/ANSYS+Fluent.
[6] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, {\it Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide}, SIAM, Philadelphia, 2000. · Zbl 0965.65058
[7] R. Barrett, M. W. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst, {\it Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods}, SIAM, Philadelphia, 1993. · Zbl 0814.65030
[8] M. Benzi, D. B. Szyld, and A. van Duin, {\it Orderings for incomplete factorization preconditioning of nonsymmetric problems}, SIAM J. Sci. Comput., 20 (1999), pp. 1652-1670. · Zbl 0940.65033
[9] E. F. F. Botta and A. van der Ploeg, {\it Renumbering strategies based on multi-level techniques combined with ILU-decompositions}, Zh. Vychisl. Mat. Mat. Fiz., 37 (1997), pp. 1294-1300 (in Russian); Comput. Math. Math. Phys., 37 (1997), pp. 1252-1258 (in English). · Zbl 0946.65019
[10] E. F. F. Botta and F. W. Wubs, {\it Matrix renumbering ILU: An effective algebraic multilevel ILU preconditioner for sparse matrices}, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 1007-1026. · Zbl 0937.65057
[11] A. Brandt, {\it Algebraic multigrid theory: The symmetric case}, Appl. Math. Comput., 19 (1986), pp. 23-56. · Zbl 0616.65037
[12] J. Brannick, Y. Chen, X. Hu, and L. Zikatanov, {\it Parallel unsmoothed aggregation algebraic multigrid algorithms on GPUs}, in Numerical Solution of Partial Differential Equations: Theory, Algorithms, and Their Applications, Springer Proc. Math. Stat. 45, Springer, New York, 2013, pp. 81-102. · Zbl 1275.65084
[13] N. Bell, S. Dalton, and L. N. Olson, {\it Exposing fine-grained parallelism in algebraic multigrid methods}, SIAM J. Sci. Comput., 34 (2012), pp. C123-C152. · Zbl 1253.65041
[14] A. Buluç and J. R. Gilbert, {\it Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments}, SIAM J. Sci. Comput., 34 (2012), pp. C170-C191. · Zbl 1252.05112
[15] C. Chevalier and F. Pellegrini, {\it PT-Scotch: A tool for efficient parallel graph ordering}, Parallel Comput., 34 (2008), pp. 318-331.
[16] J. Cohen and P. Castonguay, {\it Efficient Graph Matching and Coloring on the GPU}, GPU Tech., GTC On-Demand S2332, 2012; available online at http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php.
[17] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, {\it Introduction to Algorithms}, 2nd ed., The MIT Press, Cambridge, MA, 2001. · Zbl 1047.68161
[18] T. A. Davis and Y. Hu, {\it The University of Florida Sparse Matrix Collection}, ACM Trans. Math. Software, 38 (2011); available online at http://www.cise.ufl.edu/research/sparse/matrices/. · Zbl 1365.65123
[19] H. De Sterck, U. M. Yang, and J. J. Heys, {\it Reducing complexity in parallel algebraic multigrid preconditioners}, SIAM J. Matrix Anal. Appl., 27 (2006), pp. 1019-1039. · Zbl 1102.65034
[20] H. De Sterck, R. D. Falgout, J. W. Nolting, and U. M. Yang, {\it Distance-two interpolation for parallel algebraic multigrid}, Numer. Linear Algebra Appl., 15 (2008), pp. 115-139. · Zbl 1212.65139
[21] J. Demouth, {\it Optimization of a Sparse Matrix-Matrix Multiplication on the GPU}, GPU Tech., GTC On-Demand S2285, 2012; available online at http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php.
[22] I. S. Duff and G. A. Meurant, {\it The effect of ordering on preconditioned conjugate gradients}, BIT, 29 (1999), pp. 635-657. · Zbl 0687.65037
[23] L. C. Dutto, {\it The effect of ordering on preconditioned GMRES algorithm for solving the compressible Navier-Stokes equations}, Internat. J. Numer. Methods Engrg., 36 (1993), pp. 457-497. · Zbl 0767.76026
[24] H. C. Elman and E. Agron, {\it Ordering techniques for the preconditioned conjugate gradient method on parallel computers}, Comput. Phys. Comm., 53 (1989), pp. 253-269. · Zbl 0798.65038
[25] R. D. Falgout, {\it An introduction to algebraic multigrid}, Comput. Sci. Eng., 8 (2006), pp. 24-33.
[26] R. D. Falgout and U. M. Yang, {\it HYPRE: A library of high performance preconditioners}, in Computational Science–ICCS 2002, Lecture Notes in Comput. Sci. 2331, Springer-Verlag, Berlin, Heidelberg, 2002, pp. 632-641. · Zbl 1056.65046
[27] A. H. Gebremedhin, {\it Parallel Graph Coloring}, Ph.D. thesis, University of Bergen, Bergen, Norway, 1999. · Zbl 1008.68565
[28] P. Gonzalez, J. C. Cabaleiro, and T. F. Pena, {\it Parallel incomplete LU factorization as a preconditioner for Krylov subspace methods}, Parallel Process. Lett., 9 (1999), pp. 467-474.
[29] F. G. Gustavson, {\it Two fast algorithms for sparse matrices: Multiplication and permuted transposition}, ACM Trans. Math. Software, 4 (1978), pp. 250-269. · Zbl 0384.65016
[30] G. Haase, M. Liebmann, C. C. Douglas, and G. Plank, {\it A parallel algebraic multigrid solver on graphics processing units}, in High Performance Computing and Applications, Lecture Notes in Comput. Sci. 5938, Springer-Verlag, Berlin, Heidelberg, 2010, pp. 38-47.
[31] M. Hoemmen, {\it Communication-Avoiding Krylov Subspace Methods}, Ph.D. thesis, University of California-Berkeley, Berkeley, CA, 2010.
[32] R. A. Horn and C. R. Johnson, {\it Matrix Analysis}, Cambridge University Press, New York, 1999.
[33] T. R. Jensen and B. Toft, {\it Graph Coloring Problems}, John Wiley & Sons, New York, 1995. · Zbl 0855.05054
[34] M. T. Jones and P. E. Plassman, {\it A parallel graph coloring heuristic}, SIAM J. Sci. Comput., 14 (1993), pp. 654-669. · Zbl 0772.68046
[35] R. M. Karp and M. Sipser, {\it Maximum matching in sparse random graphs}, in Proceedings of the 22nd Annual IEEE Symposium on Foundations of Computer Science, 1981, pp. 364-375.
[36] G. Karypis and V. Kumar, {\it A fast and high quality multilevel scheme for partitioning irregular graphs}, SIAM J. Sci. Comput., 20 (1998), pp. 359-392. · Zbl 0915.68129
[37] H. Kim, J. Xu, and L. Zikatanov, {\it A multigrid method based on graph matching for convection-diffusion equations}, Numer. Linear Algebra Appl., 10 (2003), pp. 181-195. · Zbl 1071.65167
[38] J. Kraus and M. Förster, {\it Efficient AMG on heterogeneous systems}, in Facing the Multicore-Challenge II, Lecture Notes in Comput. Sci. 7174, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 133-146.
[39] M. Luby, {\it A simple parallel algorithm for the maximal independent set problem}, SIAM J. Comput., 15 (1986), pp. 1036-1053. · Zbl 0619.68058
[40] A. C. Muresan and Y. Notay, {\it Analysis of aggregation-based multigrid}, SIAM J. Sci. Comput., 30 (2008), pp. 1082-1103. · Zbl 1163.65092
[41] M. Naumov, {\it Preconditioned block-iterative methods on GPUs}, PAMM. Proc. Appl. Math. Mech., 12 (2012), pp. 11-14.
[42] M. Naumov, {\it Parallel Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU}, Nvidia Technical Report NVR-2012-003, Nvidia Corp., Santa Clara, CA, 2012.
[43] M. Naumov, P. Castonguay, and J. Cohen, {\it Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU}, Nvidia Technical Report NVR-2015-001, Nvidia Corp., Santa Clara, CA, 2015.
[44] Y. Notay, {\it Flexible conjugate gradients}, SIAM J. Sci. Comput., 22 (2000), pp. 1444-1460. · Zbl 0980.65030
[45] Y. Notay, {\it An aggregation-based algebraic multigrid method}, Electron. Trans. Numer. Anal., 37 (2010), pp. 123-146. · Zbl 1206.65133
[46] Nvidia, {\it CUSPARSE and CUBLAS Libraries}, \burlhttps://developer.nvidia.com/cuda-toolkit.
[47] M. Pakzad, J. L. Lloyd, and C. Philipps, {\it Independent columns: A new parallel ILU preconditioner for the PCG methods}, Parallel Comput., 21 (1995), pp. 583-605.
[48] Y. Saad, {\it A flexible inner-outer preconditioned GMRES algorithm}, SIAM J. Sci. Comput., 14 (1993), pp. 461-469. · Zbl 0780.65022
[49] Y. Saad, {\it Iterative Methods for Sparse Linear Systems}, 2nd ed., SIAM, Philadelphia, 2003. · Zbl 1031.65046
[50] Schlumberger Ltd., {\it ECLIPSE}, http://www.software.slb.com/products/foundation/Pages/eclipse.aspx.
[51] Schlumberger Ltd., {\it INTERSECT}, http://www.software.slb.com/products/foundation/Pages/intersect.aspx.
[52] J. C. Setubal, {\it New experimental results for bipartite matching}, in Proceedings of netflow93, Technical Report TR-21/93, Dipartamento de Informatica, Universita di Pisa, Pisa, Italy, 1993, pp. 211-216.
[53] V. Simoncini and D. B. Szyld, {\it Flexible inner-outer Krylov subspace methods}, SIAM J. Numer. Anal., 40 (2003), pp. 2219-2239. · Zbl 1047.65021
[54] Stone Ridge Technology, {\it GAMPACK, GPU Accelerated Algebraic Multigrid Package}, http://www.stoneridgetechnology.com/products/gampack.
[55] K. Stüben, {\it Algebraic Multigrid (AMG): An Introduction with Applications}, GMD Report 53, GMD, St. Augustin, Germany, 1999.
[56] R. S. Tuminaro and C. Tong, {\it Parallel smoothed aggregation multigrid: Aggregation strategies on massively parallel machines}, in Proceedings of the ACM/IEEE 2000 Conference on Supercomputing, 2000.
[57] P. Vaněk, M. Brezina, and J. Mandel, {\it Convergence of algebraic multigrid based on smoothed aggregation}, Numer. Math., 88 (2001), pp. 559-579. · Zbl 0992.65139
[58] P. Vaněk, J. Mandel, and M. Brezina, {\it Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems}, Computing, 56 (1996), pp. 179-196. · Zbl 0851.65087
[59] J. A. Vogel, {\it Flexible BiCG and flexible BiCGStab for nonsymmetric linear systems}, Appl. Math. Comput., 188 (2007), pp. 226-233. · Zbl 1114.65318
[60] U. M. Yang, {\it On long range interpolation operators for aggressive coarsening}, Numer. Linear Algebra Appl., 17 (2010), pp. 453-472. · Zbl 1240.65286
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.