×

Impact of mixed precision and storage layout on additive Schwarz smoothers. (English) Zbl 07396249

Summary: The growing discrepancy between CPU computing power and memory bandwidth drives more and more numerical algorithms into a bandwidth-bound regime. One example is the overlapping Schwarz smoother, a highly effective building block for iterative multigrid solution of elliptic equations with higher order finite elements. Two options of reducing the required memory bandwidth are sparsity exploiting storage layouts and representing matrix entries with reduced precision in floating point or fixed point format. We investigate the impact of several options on storage demand and contraction rate, both analytically in the context of subspace correction methods and numerically at an example of solid mechanics. Both perspectives agree on the favourite scheme: fixed point representation of Cholesky factors in nested dissection storage.

MSC:

65N55 Multigrid methods; domain decomposition for boundary value problems involving PDEs
65Y10 Numerical algorithms for specific classes of architectures
65Y20 Complexity and performance of numerical algorithms

Software:

Kaskade7; DUNE
Full Text: DOI

References:

[1] DeuflhardP, WeiserM. Adaptive numerical solution of PDEs. Berlin, Germany: de Gruyter; 2012. · Zbl 1268.65146
[2] ZienkiewiczOC, TaylorRL, ZhuJZ. The finite element method. Oxford, England: Elsevier; 2005.
[3] SundarH, StadlerG, BirosG. Comparison of multigrid algorithms for high‐order continuous finite element discretizations. Numer Linear Algebra Appl.2015;22(4):664-80. · Zbl 1349.65680
[4] BenedusiP, HuppD, ArbenzP, KrauseRA. Parallel multigrid solver for time‐periodic incompressible Navier‐Stokes equations in 3D. Numerical mathematics and advanced applications ENUMATH 2015. New York, NY: Springer; 2016:265-73. · Zbl 1388.76198
[5] PavarinoLF. Additive Schwarz methods for the p‐version finite element method. Numer Math. 1994;66(1):493-515. · Zbl 0791.65083
[6] SchöberlJ, MelenkJM, PechsteinC, ZaglmayrS. Additive Schwarz preconditioning for p‐version triangular and tetrahedral finite elements. IMA J Numer Anal. 2008;28(1):1-24. · Zbl 1153.65113
[7] PasquettiR, PavarinoLF, RapettiF, ZampieriE. Overlapping schwarz preconditioners for fekete spectral elements. In: WidlundOB (ed.), KeyesDE (ed.), editors. Domain decomposition methods in science and engineering XVI. Lecture Notes in Computational Science and Engineering. Volume 55. New York, NY: Springer; 2007.
[8] McCalpinJD. Memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newslett. 1995;(12):19-25.
[9] DavisTA, RajamanickamS, Sid‐LakhdarWM. A survey of direct methods for sparse linear systems. Acta Numerica. 2006;25:383-566. · Zbl 1346.65011
[10] HackbuschW. A sparse matrix arithmetic based on \(\mathcal{H} \)‐matrices, Part I: introduction to \(\mathcal{H} \)‐matrices. Computing. 1999;62:89-108. · Zbl 0927.65063
[11] ClarkMA, BabichR, BarrosK, BrowerRC, RebbiC. Solving lattice QCD systems of equations using mixed precision solvers on GPUs. Comput Phys Commun. 2010;181(9):1517-28. Available from. http://www.sciencedirect.com/science/article/pii/S0010465510001426. · Zbl 1215.81124
[12] AnztH, LuszczekP, DongarraJ, HeuvelineV. GPU‐accelerated asynchronous error correction for mixed precision iterative refinement. In: KaklamanisC (ed.), PapatheodorouT (ed.), SpirakisPG (ed.), editors. Euro‐Par 2012 parallel processing. vol. 7484 of Lecture Notes in Computer Science. New York, NY: Springer; 2012.
[13] BaboulinM, ButtariA, DongarraJ, KurzakJ, LangouJ, LangouJ, et al. Accelerating scientific computations with mixed precision algorithms. Comput Phys Commun. 2009;180(12):2526-33. · Zbl 1197.65240
[14] CherubinS, AgostaG, LasriI, RohouE, SentieysO. Implications of reduced‐precision computations in HPC: performance, energy and error. In: BassiniS (ed.), DaneluttoM (ed.), editors. Parallel computing is everywhere. vol. 32 of Advances in Parallel Computing. Amsterdam, The Netherlands: IOS Press; 2018:297-306.
[15] GiraudL, HaidarA, WatsonLT. Mixed‐precision preconditioners in parallel domain decomposition solvers. In: LangerU (ed.), DiscacciatiM (ed.), KeyesDE (ed.), WidlundOB (ed.), ZulehnerW (ed.), editors. Domain decomposition methods in science and engineering XVII. vol. 60 of Lecture Notes in Computational Science and Engineering; 2008:357-64. Berlin, Germany: Springer. · Zbl 1139.65022
[16] GöddekeD. Fast and accurate finite‐element multigrid solvers for PDE simulations on GPU clusters. Dortmund, Germany: University of Dortmund; 2010.
[17] AnztH, DongarraJ, FlegarG, HighamNJ, Quintana‐OrtíES. Adaptive precision in block‐Jacobi preconditioning for iterative sparse linear system solvers. Concurr Comput Pract Exper. 2019;31(6):e4460. https://doi.org/10.1002/cpe.4460. · doi:10.1002/cpe.4460
[18] GötschelS, vonTycowiczC, PolthierK, WeiserM. Reducing memory requirements in scientific computing and optimal control. In: CarraroT (ed.), GeigerM (ed.), KörkelS (ed.), RannacherR (ed.), editors. Multiple shooting and time domain decomposition methods. New York, NY: Springer; 2015:263-87. · Zbl 1337.65052
[19] LindstromP. Fixed‐rate compressed floating‐point arrays. IEEE Trans Vis Comp Graphics. 2014;20(12):2674-83.
[20] DavisTA. Direct methods for sparse linear systems. Philadelphia, PA: SIAM; 2006. · Zbl 1119.65021
[21] XuJ. Iterative methods by space decomposition and subspace correction. SIAM Rev. 1992;34(4):581-613. · Zbl 0788.65037
[22] BrambleJH, PasciakJE, XuJ. Parallel multilevel preconditioners. Math Comp. 1990;55:1-22. · Zbl 0703.65076
[23] PavarinoL, ScacchiS. Multilevel additive schwarz preconditioners for the bidomain reaction‐diffusion system. SIAM J Sci Comput. 2008;31(1):420-43. · Zbl 1185.65179
[24] BicăI. Iterative substructuring algorithms for the p‐version finite element method for elliptic problems. New York, NY: Courant Institute, New York University; 1997.
[25] ArnoldDN, FalkRS, WintherR. Multigrid in H(div) and H(curl). Numer Math. 2000;85:197-217. · Zbl 0974.65113
[26] MitchellL, WechsungF, FarrellPE. An augmented Lagrangian preconditioner for the 3D stationary incompressible Navier‐Stokes equations at high Reynolds number. SIAM J Sci Comput. 2019;41:A3073-96. · Zbl 1448.65261
[27] FarrellPE, HeY, MacLachlanS. A local Fourier analysis for additive Vanka relaxation for the Stokes equations. Numer Linear Alg Appl. 2019;e2306. https://doi.org/10.1002/nla.2306. · Zbl 07361104 · doi:10.1002/nla.2306
[28] GeorgeA, LiuJWH. Computer solution of large sparse positive definite systems. Upper Saddle River, NJ: Prentice‐Hall; 1981. · Zbl 0516.65010
[29] WilsonEL. The static condensation algorithm. Int J Numer Meth Eng. 1974;8(1):198-203.
[30] GötschelS, WeiserM, SchielaA. Solving optimal control problems with the kaskade 7 finite element toolbox. In: DednerA (ed.), FlemischB (ed.), KlöfkornR (ed.), editors. Advances in DUNE. New York, NY: Springer; 2012;101-12.
[31] GötschelS, SchielaA, WeiserM. Kaskade 7 - a flexible finite element toolbox. Comp Math Appl. 2021;81:444-458. · Zbl 1524.65006
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.