×

Reordering strategy for blocking optimization in sparse linear solvers. (English) Zbl 1366.65047

Summary: Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time-consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For this reason, optimizing their performance on modern architectures is critical. The preprocessing steps of sparse direct solvers – ordering and block-symbolic factorization – are two major steps that lead to a reduced amount of computation and memory and to a better task granularity to reach a good level of performance when using BLAS kernels. With the advent of GPUs, the granularity of the block computation has become more important than ever. In this paper, we present a reordering strategy that increases this block granularity. This strategy relies on block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. We integrate this algorithm in the PaStiX solver and show an important reduction of the number of off-diagonal blocks on a large spectrum of matrices. This improvement leads to an increase in efficiency of up to 20% on GPUs.

MSC:

65F05 Direct numerical methods for linear systems and matrix inversion
65F50 Computational methods for sparse matrices
Full Text: DOI

References:

[1] E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, {\it Multifrontal QR factorization for multicore architectures over runtime systems}, in Euro-Par 2013 Parallel Processing, Lecture Notes in Comput. Sci. 8097, F. Wolf, B. Mohr, and D. Mey, eds., Springer, Berlin, Heidelberg, 2013, pp. 521-532, . · Zbl 1369.65062
[2] P. R. Amestoy, A. Buttari, I. S. Duff, A. Guermouche, J. L’Excellent, and B. Uçar, {\it MUMPS}, in Encyclopedia of Parallel Computing, D. Padua, ed., Springer, 2011, pp. 1232-1238, .
[3] P. R. Amestoy, T. A. Davis, and I. S. Duff, {\it An approximate minimum degree ordering algorithm}, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 886-905, . · Zbl 0861.65021
[4] D. L. Applegate, R. E. Bixby, V. Chvatal, and W. J. Cook, {\it The Traveling Salesman Problem: A Computational Study}, Princeton Series in Applied Mathematics, Princeton University Press, Princeton, NJ, 2007. · Zbl 1130.90036
[5] G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault, and J. J. Dongarra, {\it PaRSEC: Exploiting heterogeneity to enhance scalability}, Comput. Sci. Engrg., 15 (2013), pp. 36-45.
[6] P. Charrier and J. Roman, {\it Algorithmic study and complexity bounds for a nested dissection solver}, Numer. Math., 55 (1989), pp. 463-476. · Zbl 0663.65020
[7] N. Christofides, {\it Worst-Case Analysis of a New Heuristic for the Travelling Salesman Problem}, tech. report, DTIC Document, 1976.
[8] T. A. Davis and Y. Hu, {\it The University of Florida sparse matrix collection}, ACM Trans. Math. Software, 38 (2011), 1, . · Zbl 1365.65123
[9] J. Dongarra, J. D. Croz, S. Hammarling, and I. S. Duff, {\it A set of level 3 basic linear algebra subprograms}, ACM Trans. Math. Software, 16 (1990), pp. 1-17, . · Zbl 0900.65115
[10] A. George, {\it Nested dissection of a regular finite element mesh}, SIAM J. Numer. Anal., 10 (1973), pp. 345-363, . · Zbl 0259.65087
[11] A. George and J. W. Liu, {\it Computer Solution of Large Sparse Positive Definite}, Prentice-Hall Professional Technical Reference, 1981. · Zbl 0516.65010
[12] A. George and D. R. McIntyre, {\it On the application of the minimum degree algorithm to finite element systems}, SIAM J. Numer. Anal., 15 (1978), pp. 90-112, . · Zbl 0389.65014
[13] R. W. Hamming, {\it Error detecting and error correcting codes}, Bell System Tech. J., 26 (1950), pp. 147-160. · Zbl 1402.94084
[14] P. Hénon, P. Ramet, and J. Roman, {\it PaStiX: A high-performance parallel direct solver for sparse symmetric definite systems}, Parallel Comput., 28 (2002), pp. 301-321. · Zbl 0984.68208
[15] J. D. Hogg, J. K. Reid, and J. A. Scott, {\it Design of a multicore sparse Cholesky factorization using DAGs}, SIAM J. Sci. Comput., 32 (2010), pp. 3627-3649, . · Zbl 1221.65088
[16] D. S. Johnson and L. A. McGeoch, {\it The traveling salesman problem: A case study in local optimization}, in Local Search in Combinatorial Optimization, Wiley, Chichester, UK, 1997, pp. 215-310. · Zbl 0947.90612
[17] G. Karypis and V. Kumar, {\it A fast and high quality multilevel scheme for partitioning irregular graphs}, SIAM J. Sci. Comput., 20 (1998), pp. 359-392, . · Zbl 0915.68129
[18] X. Lacoste, {\it Scheduling and Memory Optimizations for Sparse Direct Solver on Multi-core/Multi-gpu Cluster Systems}, Ph.D. thesis, Université Bordeaux, Talence, France, 2015.
[19] X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, {\it Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes}, in Proceedings of the 2014 IEEE International Parallel Distributed Processing Symposium Workshops (IPDPSW), Phoenix, AZ, IEEE, 2014, pp. 29-38.
[20] R. J. Lipton and R. E. Tarjan, {\it A separator theorem for planar graphs}, SIAM J. Appl. Math., 36 (1979), pp. 177-189, . · Zbl 0432.05022
[21] J. W. H. Liu, {\it The role of elimination trees in sparse factorization}, SIAM J. Matrix Anal. Appl., 11 (1990), pp. 134-172, . · Zbl 0697.65013
[22] J. W. H. Liu, E. G. Ng, and B. W. Peyton, {\it On finding supernodes for sparse matrix computations}, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 242-252, . · Zbl 0765.65034
[23] R. Luce and E. G. Ng, {\it On the minimum FLOPs problem in the sparse Cholesky factorization}, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 1-21, . · Zbl 1298.65047
[24] G. L. Miller, S.-H. Teng, and S. A. Vavasis, {\it A unified geometric approach to graph separators}, in Proceedings of the 31st Annual Symposium on Foundations of Computer Science, IEEE, 1991, pp. 538-547.
[25] G. L. Miller and S. A. Vavasis, {\it Density graphs and separators}, in Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, ACM, New York, 1991, pp. 331-336. · Zbl 0785.05029
[26] F. Pellegrini, {\it Scotch and libScotch \textup5.1 User’s Guide}, 2008.
[27] D. J. Rose and R. E. Tarjan, {\it Algorithmic aspects of vertex elimination on directed graphs}, SIAM J. Appl. Math., 34 (1978), pp. 176-197, . · Zbl 0377.65013
[28] D. J. Rosenkrantz, R. E. Stearns, and P. M. Lewis, II, {\it An analysis of several heuristics for the traveling salesman problem}, SIAM J. Comput., 6 (1977), pp. 563-581, . · Zbl 0364.90104
[29] W. M. Sid-Lakhdar, {\it Scaling the Solution of Large Sparse Linear Systems Using Multifrontal Methods on Hybrid Shared-Distributed Memory Architectures}, Ph.D. thesis, École Normale Supérieure de Lyon, 2014.
[30] STFC (Science and Technology Facilities Council), {\it The HSL Mathematical Software Library. A Collection of Fortran Codes for Large Scale Scientific Computation}, .
[31] W. F. Tinney and J. W. Walker, {\it Direct solutions of sparse network equations by optimally ordered triangular factorization}, Proc. IEEE, 55 (1967), pp. 1801-1809.
[32] University of Waterloo, {\it Concorde TSP Solver}, .
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.