Document Zbl 1478.76053

High-performance implementation of discontinuous Galerkin methods with application in fluid flow. (English) Zbl 1478.76053

Kronbichler, Martin (ed.) et al., Efficient high-order discretizations for computational fluid dynamics. Selected papers based on the presentations at the summer school, Udine, Italy, July 16–20, 2018. Cham: Springer. CISM Courses Lect. 602, 57-115 (2021).

Summary: In this book chapter, the high-performance implementation of discontinuous Galerkin methods is reviewed, with the main focus on sum factorization algorithms. The main computational properties of the algorithms are compared to capabilities of modern computer hardware, highlighting the opportunities and limitations of discontinuous Galerkin discretizations. The chapter closes with a presentation of how to apply these algorithms to the compressible Euler equations, the acoustic wave equation, and the incompressible Navier-Stokes equations.
For the entire collection see [Zbl 1468.76003].

Cited in 1 Document

MSC:

76M10	Finite element methods applied to problems in fluid mechanics
76M20	Finite difference methods applied to problems in fluid mechanics
76N15	Gas dynamics (general theory)
76D05	Navier-Stokes equations for incompressible viscous fluids

Keywords:

sum factorization algorithm; explicit Runge-Kutta time integration; parallel computation; compressible Euler equations; acoustic wave; incompressible Navier-Stokes equations

Software:

Nektar++; ExWave; ExaDG; Peano; deal.ii; pTatin3D; NGSolve; likwid; PARDISO; UMFPACK; p4est; FEniCS; SPECFEM3D; DUNE; Firedrake; Nek5000; FLEXI

Cite Review PDF

Full Text: DOI

References:

[1]	Adams, M., Brezina, M., Hu, J., & Tuminaro, R. (2003). Parallel multigrid smoothing: Polynomial versus Gauss-Seidel. Journal of Computational Physics, 188, 593-610. doi:10.1016/S0021-9991(03)00194-3. · Zbl 1022.65030
[2]	Alnæs, M. S., Logg, A., Ølgaard, K. B., Rognes, M. E., & Wells, G. N. (2014). Unified form language. ACM Transactions on Mathematical Software, 40(2), 1-37. doi:10.1145/2566630. · Zbl 1308.65175
[3]	Alnæs, M. S., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., et al. (2015). The FEniCS project version 1.5. Archive of Numerical Software, 3(100). doi:10.11588/ans.2015.100.20553.
[4]	Alzetta, G., Arndt, D., Bangerth, W., Boddu, V., Brands, B., Davydov, D., et al. (2018). The deal.II library, version 9.0. Journal of Numerical Mathematics, 26(4), 173-184. doi:10.1515/jnma-2018-0054. · Zbl 1410.65363
[5]	Amdahl, G. M. (1967). Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS Conference Proceedings (Vol. 30, pp. 483-485). doi:10.1145/1465482.1465560.
[6]	Anderson, R., Andrej, J., Barker, A., Bramwell, J., Camier, J.-S., Cerveny, J., et al. (2020). MFEM: A modular finite element methods library. Computers and Mathematics with Applications, in press. doi:10.1016/j.camwa.2020.06.009.
[7]	Arndt, D., Bangerth, W., Blais, B., Clevenger, T. C., Fehling, M., Grayver, A. V., et al. (2020a). The deal.II library, version 9.2. Journal of Numerical Mathematics, in press. doi:10.1515/jnma-2020-0043. · Zbl 1452.65222
[8]	Arndt, D., Bangerth, W., Davydov, D., Heister, T., Heltai, L., Kronbichler, M., et al. (2020b). The deal.II finite element library: Design, features, and insights. Computers and Mathematics with Applications, in press. doi:10.1016/j.camwa.2020.02.022. · Zbl 1524.65002
[9]	Arndt, D., Fehn, N., Kanschat, G., Kormann, K., Kronbichler, M., Munch, P., et al. (2020c). ExaDG - high-order discontinuous Galerkin for the exa-scale. In H.-J. Bungartz, S. Reiz, B. Uekermann, P. Neumann, & W. E. Nagel (Eds.), Software for exascale computing - SPPEXA 2016-2019. Lecture notes in computational science and engineering (Vol. 136, pp. 189-224). Cham: Springer International Publishing. doi:10.1007/978-3-030-47956-5_8.
[10]	Bangerth, W., Burstedde, C., Heister, T., & Kronbichler, M. (2011). Algorithms and data structures for massively parallel generic finite element codes. ACM Transactions on Mathematical Software, 38(2), 14:1-14:28. doi:10.1145/2049673.2049678. · Zbl 1365.65247
[11]	Barra, V., Beams, N., Brown, J., Camier, J.-S., Dobrev, V., Dudouit, Y., et al. (2020). libCEED development site. https://github.com/ceed/libceed.
[12]	Bassi, F., Botti, L., Colombo, A., Crivellini, A., Franciolini, M., Ghidoni, A., et al. (2020). A p-adaptive matrix-free discontinuous Galerkin method for the implicit LES of incompressible transitional flows. Flow, Turbulence and Combustion, 105(2), 437-470. doi:10.1007/s10494-020-00178-2.
[13]	Bastian, P., Müller, E. H., Müthing, S., & Piatkowski, M. (2019). Matrix-free multigrid block-preconditioners for higher order discontinuous Galerkin discretisations. Journal of Computational Physics, 394, 417-439. doi:10.1016/j.jcp.2019.06.001. · Zbl 1452.65322
[14]	Bastian, P., Blatt, M., Dedner, A., Dreier, N.-A., Engwer, C., Fritze, R., et al. (2020). The DUNE framework: Basic concepts and recent developments. Computers and Mathematics with Applications, in press. doi:10.1016/j.camwa.2020.06.007. · Zbl 1524.65003
[15]	Bramble, J. H., Pasciak, J. E., & Xu, J. (1991). The analysis of multigrid algorithms with nonnested spaces or noninherited quadratic forms. Mathematics of Computation, 56(193), 1-1. doi:10.1090/s0025-5718-1991-1052086-4. · Zbl 0718.65081
[16]	Brightwell, R., Riesen, R., & Underwood, K. D. (2005). Analyzing the impact of overlap, offload, and independent progress for message passing interface applications. International Journal of High Performance Computing Applications, 19(2), 103-117. doi:10.1177/1094342005054257.
[17]	Brown, J., Efficient nonlinear solvers for nodal high-order finite elements in 3D, Journal of Scientific Computing, 45, 1-3, 48-63 (2010) · Zbl 1203.65245 · doi:10.1007/s10915-010-9396-8
[18]	Buis, PE; Dyksen, WR, Efficient vector and parallel manipulation of tensor products, ACM Transactions on Mathematical Software, 22, 1, 18-23 (1996) · Zbl 0884.65037 · doi:10.1145/225545.225548
[19]	Burstedde, C., Wilcox, L. C., & Ghattas, O. (2011). p4est: Scalable algorithms for parallel adaptive mesh refinement on forests of octrees. SIAM Journal on Scientific Computing, 33(3), 1103-1133. doi:10.1137/10079163, http://p4est.org. · Zbl 1230.65106
[20]	Cantwell, C. D., Sherwin, S. J., Kirby, R. M., & Kelly, P. H. J. (2011). Form h to p efficiently: Selecting the optimal spectral/\(hp\) discretisation in three dimensions. Mathematical Modelling of Natural Phenomena, 6. · Zbl 1243.65136
[21]	Cantwell, C. D., Moxey, D., Comerford, A., Bolis, A., Rocco, G., Mengaldo, G., et al. (2015). Nektar++: An open-source spectral/hp element framework. Computer Physics Communications, 192, 205-219. doi:10.1016/j.cpc.2015.02.008. · Zbl 1380.65465
[22]	Chang, J., Fabien, M. S., Knepley, M. G., & Mills, R. T. (2018). Comparative study of finite element methods using the time-accuracy-size (TAS) spectrum analysis. SIAM Journal on Scientific Computing, 40(6), C779-C802. doi:10.1137/18m1172260. · Zbl 1417.65224
[23]	Davis, T. A. (2004). Algorithm 832: UMFPACK V4.3—an unsymmetric-pattern multifrontal method. ACM Transactions on Mathematical Software, 30, 196-199. doi:10.1145/992200.992206. · Zbl 1072.65037
[24]	Deville, M. O., Fischer, P. F., & Mund, E. H. (2002). High-order methods for incompressible fluid flow (Vol. 9). Cambridge: Cambridge University Press. · Zbl 1007.76001
[25]	Diosady, L. T., & Murman, S. M. (2019). Scalable tensor-product preconditioners for high-order finite-element methods: Scalar equations. Journal of Computational Physics, 394, 759-776. doi:10.1016/j.jcp.2019.04.047. · Zbl 1452.65050
[26]	Elman, H., Silvester, D., & Wathen, A. (2005). Finite elements and fast iterative solvers with applications in incompressible fluid dynamics. Oxford: Oxford Science Publications. · Zbl 1083.76001
[27]	Fehn, N., Wall, W. A., & Kronbichler, M. (2017). On the stability of projection methods for the incompressible Navier-Stokes equations based on high-order discontinuous Galerkin discretizations. Journal of Computational Physics, 351, 392-421. doi:10.1016/j.jcp.2017.09.031. · Zbl 1380.65204
[28]	Fehn, N., Wall, W. A., & Kronbichler, M. (2018a). Robust and efficient discontinuous Galerkin methods for under-resolved turbulent incompressible flows. Journal of Computational Physics, 372, 667-693. doi:10.1016/j.jcp.2018.06.037. · Zbl 1415.76451
[29]	Fehn, N., Wall, W. A., & Kronbichler, M. (2018b). Efficiency of high-performance discontinuous Galerkin spectral element methods for under-resolved turbulent incompressible flows. International Journal for Numerical Methods in Fluids, 88(1), 32-54. doi:10.1002/fld.4511. · Zbl 1415.76451
[30]	Fehn, N., Kronbichler, M., Lehrenfeld, C., Lube, G., & Schroeder, P. W. (2019a). High-order DG solvers for under-resolved turbulent incompressible flows: A comparison of \(L^2\) and \(H\)(div) methods. International Journal for Numerical Methods in Fluids, 91(11), 533-556. doi:10.1002/fld.4763.
[31]	Fehn, N., Wall, W. A., & Kronbichler, M. (2019b). A matrix-free high-order discontinuous Galerkin compressible Navier-Stokes solver: A performance comparison of compressible and incompressible formulations for turbulent incompressible flows. International Journal for Numerical Methods in Fluids, 89(3), 71-102. doi:10.1002/fld.4683.
[32]	Fehn, N., Heinz, J., Wall, W. A., & Kronbichler, M. (2020a). High-order arbitrary Lagrangian-Eulerian discontinuous Galerkin methods for the incompressible Navier-Stokes equations. Technical report. arXiv:2003.07166.
[33]	Fehn, N., Munch, P., Wall, W. A., & Kronbichler, M. (2020b). Hybrid multigrid methods for high-order discontinuous Galerkin discretizations. Journal of Computational Physics, 415, 109538. doi:10.1016/j.jcp.2020.109538. · Zbl 1440.65135
[34]	Fischer, P., Min, M., Rathnayake, T., Dutta, S., Kolev, T., Dobrev, V., et al. (2020). Scalability of high-performance PDE solvers. International Journal of High Performance Computing Applications, 34(5), 562-586. doi:10.1177/1094342020915762.
[35]	Fischer, P. F. (1997). An overlapping Schwarz method for spectral element solution of the incompressible Navier-Stokes equations. Journal of Computational Physics, 133(1), 84-101. doi:10.1006/jcph.1997.5651. · Zbl 0904.76057
[36]	Fischer, P. F., & Patera, A. T. (1991). Parallel spectral element solution of the Stokes problem. Journal of Computational Physics, 92(2), 380-421. doi:10.1016/0021-9991(91)90216-8. · Zbl 0709.76106
[37]	Fischer, P. F., Kerkemeier, S., et al. (2020). Nek5000 Web page. https://nek5000.mcs.anl.gov.
[38]	Franco, M., Camier, J.-S., Andrej, J., & Pazner, W. (2020). High-order matrix-free incompressible flow solvers with GPU acceleration and low-order refined preconditioners. Computers and Fluids, 203, 104541. doi:10.1016/j.compfluid.2020.104541. · Zbl 1519.76145
[39]	Gholami, A., Malhotra, D., Sundar, H., & Biros, G. (2016). FFT, FMM, or multigrid? A comparative study of state-of-the-art Poisson solvers for uniform and nonuniform grids in the unit cube. SIAM Journal on Scientific Computing, 38(3), C280-C306. doi:10.1137/15M1010798. · Zbl 1369.65138
[40]	Gmeiner, B., Rüde, U., Stengel, H., Waluga, C., & Wohlmuth, B. (2015). Towards textbook efficiency for parallel multigrid. Numerical Mathematics-Theory, Methods and Applications, 8(1), 22-46. · Zbl 1340.65296
[41]	Göddeke, D., Strzodka, R., & Turek, S. (2007). Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations. International Journal of Parallel, Emergent and Distributed Systems, 22(4), 221-256. doi:10.1080/17445760601122076. · Zbl 1188.68084
[42]	Grote, M. J., & Huckle, T. (1997). Parallel preconditioning with sparse approximate inverses. SIAM Journal on Scientific Computing, 18(3), 838-853. doi:10.1137/s1064827594276552. · Zbl 0872.65031
[43]	Guermond, J.-L., & Minev, P. (2019). High-order adaptive time stepping for the incompressible Navier-Stokes equations. SIAM Journal on Scientific Computing, 41(2), A770-A788. doi:10.1137/18m1209301. · Zbl 1411.65146
[44]	Guermond, J. L., Minev, P., & Shen, J. (2006). An overview of projection methods for incompressible flows. Computer Methods in Applied Mechanics and Engineering, 195(44-47), 6011-6045. doi:10.1016/j.cma.2005.10.010. · Zbl 1122.76072
[45]	Gustafson, J. L. (1988). Reevaluating Amdahl’s law. Communications of the ACM, 31(5), 532-533. doi:10.1145/42411.42415.
[46]	Hager, G.; Wellein, G., Introduction to high performance computing for scientists and engineers (2011), Boca Raton: CRC Press, Boca Raton
[47]	Hager, G., Treibig, J., Habich, J., & Wellein, G. (2016). Exploring performance and power properties of modern multi-core chips via simple machine models. Concurrency and Computation, 28(2), 189-210. doi:10.1002/cpe.3180.
[48]	Hesthaven, J. S., & Warburton, T. (2008). Nodal discontinuous Galerkin methods: Algorithms, analysis, and applications. Berlin: Springer. doi:10.1007/978-0-387-72067-8. · Zbl 1134.65068
[49]	Hindenlang, F.; Gassner, G.; Altmann, C.; Beck, A.; Staudenmaier, M.; Munz, C-D, Explicit discontinuous Galerkin methods for unsteady problems, Computers and Fluids, 61, 86-93 (2012) · Zbl 1365.76117 · doi:10.1016/j.compfluid.2012.03.006
[50]	Hoefler, T., & Belli, R. (2015). Scientific benchmarking of parallel computing systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC’15. ACM Press. doi:10.1145/2807591.2807644.
[51]	Huismann, I., Stiller, J., & Fröhlich, J. (2017). Factorizing the factorization - a spectral-element solver for elliptic equations with linear operation count. Journal of Computational Physics, 346, 437-448. doi:10.1016/j.jcp.2017.06.012. · Zbl 1380.65373
[52]	Huismann, I., Stiller, J., & Fröhlich, J. (2019). Scaling to the stars – a linearly scaling elliptic solver for p-multigrid. Journal of Computational Physics, 398, 108868. doi:10.1016/j.jcp.2019.108868. · Zbl 1453.76152
[53]	Huismann, I., Stiller, J., & Fröhlich, J. (2020). Efficient high-order spectral element discretizations for building block operators of CFD. Computers and Fluids, 197, 104386. doi:10.1016/j.compfluid.2019.104386. · Zbl 1519.76229
[54]	Ibeid, H., Olson, L., & Gropp, W. (2019). FFT, FMM, and multigrid on the road to exascale: Performance challenges and opportunities. Journal of Parallel and Distributed Computing, 136, 63-74. doi:10.1016/j.jpdc.2019.09.014.
[55]	Karakus, A., Chalmers, N., Świrydowicz, K., & Warburton, T. (2019). A GPU accelerated discontinuous Galerkin incompressible flow solver. Journal of Computational Physics, 390, 380-404. doi:10.1016/j.jcp.2019.04.010.
[56]	Karniadakis, G., & Sherwin, S. J. (2005). Spectral/hp element methods for computational fluid dynamics (2nd ed.). Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780198528692.001.0001. · Zbl 1116.76002
[57]	Karniadakis, G. E., Israeli, M., & Orszag, S. A. (1991). High-order splitting methods for the incompressible Navier-Stokes equations. Journal of Computational Physics, 97(2), 414-443. doi:10.1016/0021-9991(91)90007-8. · Zbl 0738.76050
[58]	Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1), 359-392. doi:10.1137/S1064827595287997. · Zbl 0915.68129
[59]	Kempf, D., Hess, R., Müthing, S., & Bastian, P. (2018). Automatic code generation for high-performance discontinuous Galerkin methods on modern architectures. Technical report. arXiv:1812.08075.
[60]	Kennedy, C. A., Carpenter, M. H., & Lewis, R. M. (2000). Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations. Applied Numerical Mathematics, 35(3), 177-219. doi:10.1016/s0168-9274(99)00141-5. · Zbl 0986.76060
[61]	Keyes, D. E., McInnes, L. C., Woodward, C., Gropp, W., Myra, E., Pernice, M., et al. (2013). Multiphysics simulations: Challenges and opportunities. International Journal of High Performance Computing Applications, 27(1), 4-83. doi:10.1177/1094342012468181.
[62]	Klöckner, A. (2014). Loo.py: Transformation-based code generation for GPUs and CPUs. In Proceedings of ARRAY ‘14: ACM SIGPLAN Workshop on Libraries, Languages, and Compilers for Array Programming, Edinburgh, Scotland, 2014. Association for Computing Machinery. doi:10.1145/2627373.2627387.
[63]	Knepley, M. G., Brown, J., Rupp, K., Smith, B. F. (2013). Achieving high performance with unified residual evaluation. Technical report. arXiv:1309.1204.
[64]	Knoll, D. A., & Keyes, D. E. (2004). Jacobian-free Newton-Krylov methods: A survey of approaches and applications. Journal of Computational Physics, 193(2), 357-397. doi:10.1016/j.jcp.2003.08.010. · Zbl 1036.65045
[65]	Komatitsch, D., et al. (2015). SPECFEM 3D cartesian user manual. Technical report, Computational Infrastructure for Geodynamics, Princeton University, CNRS and University of Marseille, and ETH Zürich.
[66]	Kopriva, D. A. (2006). Metric identities and the discontinuous spectral element method on curvilinear meshes. Journal of Scientific Computing, 26(3), 301-327. doi:10.1007/s10915-005-9070-8. · Zbl 1178.76269
[67]	Kopriva, D. A. (2009). Implementing spectral methods for partial differential equations. Berlin: Springer. · Zbl 1172.65001
[68]	Kopriva, D. A., & Gassner, G. J. (2016). Geometry effects in nodal discontinuous Galerkin methods on curved elements that are provably stable. Applied Mathematics and Computation, 272, 274-290. doi:10.1016/j.amc.2015.08.047. · Zbl 1410.65372
[69]	Krais, N., Beck, A., Bolemann, T., Frank, H., Flad, D., Gassner, G., et al. (2020). FLEXI: A high order discontinuous Galerkin framework for hyperbolic-parabolic conservation laws. Computers and Mathematics with Applications. doi:10.1016/j.camwa.2020.05.004. · Zbl 1461.76347
[70]	Krank, B., Fehn, N., Wall, W. A., & Kronbichler, M. (2017). A high-order semi-explicit discontinuous Galerkin solver for 3D incompressible flow with application to DNS and LES of turbulent channel flow. Journal of Computational Physics, 348, 634-659. doi:10.1016/j.jcp.2017.07.039. · Zbl 1380.76040
[71]	Kronbichler, M., & Allalen, M. (2018). Efficient high-order discontinuous Galerkin finite elements with matrix-free implementations. In H.-J. Bungartz, D. Kranzlmüller, V. Weinberg, J. Weismüller, & V. Wohlgemuth (Eds.), Advances and new trends in environmental informatics (pp. 89-110). Berlin: Springer. doi:10.1007/978-3-319-99654-7_7.
[72]	Kronbichler, M.; Kormann, K., A generic interface for parallel cell-based finite element operator application, Computers and Fluids, 63, 135-147 (2012) · Zbl 1365.76121 · doi:10.1016/j.compfluid.2012.04.012
[73]	Kronbichler, M., & Kormann, K. (2019). Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Transactions on Mathematical Software, 45(3), 29:1-29:40. doi:10.1145/3325864. · Zbl 1486.65253
[74]	Kronbichler, M., & Ljungkvist, K. (2019). Multigrid for matrix-free high-order finite element computations on graphics processors. ACM Transactions on Parallel Computing, 6(1), 2:1-2:32. doi:10.1145/3322813.
[75]	Kronbichler, M., & Wall, W. A. (2018). A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers. SIAM Journal on Scientific Computing, 40(5), A3423-A3448. doi:10.1137/16M110455X. · Zbl 1402.65163
[76]	Kronbichler, M., Schoeder, S., Müller, C., & Wall, W. A. (2016). Comparison of implicit and explicit hybridizable discontinuous Galerkin methods for the acoustic wave equation. International Journal for Numerical Methods in Engineering, 106(9), 712-739. doi:10.1002/nme.5137. · Zbl 1352.76058
[77]	Kronbichler, M., Kormann, K., Pasichnyk, I., & Allalen, M. (2017). Fast matrix-free discontinuous Galerkin kernels on modern computer architectures. In J. M. Kunkel, R. Yokota, P. Balaji, & D. E. Keyes (Eds.), ISC high performance 2017. LNCS (Vol. 10266, pp. 237-255). doi:10.1007/978-3-319-58667-0_13.
[78]	Kronbichler, M.; Diagne, A.; Holmgren, H., A fast massively parallel two-phase flow solver for microfluidic chip simulation, International Journal of High Performance Computing Applications, 32, 2, 266-287 (2018) · doi:10.1177/1094342016671790
[79]	Kronbichler, M., Kormann, K., Fehn, N., Munch, P., Witte, J. (2019). A Hermite-like basis for faster matrix-free evaluation of interior penalty discontinuous Galerkin operators. Technical report. arXiv:1907.08492.
[80]	LeVeque, R. J. (2002). Finite volume methods for hyperbolic problems. Cambridge texts in applied mathematics. Cambridge. · Zbl 1010.65040
[81]	Loppi, N. A., Witherden, F. D., Jameson, A., & Vincent, P. E. (2018). A high-order cross-platform incompressible Navier-Stokes solver via artificial compressibility with application to a turbulent jet. Computer Physics Communications, 233, 193-205. doi:10.1016/j.cpc.2018.06.016. · Zbl 07694822
[82]	Lottes, J. W., & Fischer, P. F. (2005). Hybrid multigrid-Schwarz algorithms for the spectral element method. Journal of Scientific Computing, 24, 613-646. doi:10.1007/s10915-004-4787-3. · Zbl 1078.65570
[83]	Lynch, R. E., Rice, J. R., & Thomas, D. H. (1964). Direct solution of partial difference equations by tensor product methods. Numerische Mathematik, 6, 185-199. doi:10.1007/BF01386067. · Zbl 0126.12703
[84]	Maday, Y., Patera, A. T., & Rønquist, E. M. (1990). An operator-integration-factor splitting method for time-dependent problems: Application to incompressible fluid flow. Journal of Scientific Computing, 5(4), 263-292. doi:10.1007/bf01063118. · Zbl 0724.76070
[85]	Manzanero, J., Rubio, G., Kopriva, D. A., Ferrer, E., & Valero, E. (2020). An entropy-stable discontinuous Galerkin approximation for the incompressible Navier-Stokes equations with variable density and artificial compressibility. Journal of Computational Physics, 408, 109241. doi:10.1016/j.jcp.2020.109241. · Zbl 07505602
[86]	May, D. A., Brown, J., & Le Pourhiet, L. (2014). pTatin3D: High-performance methods for long-term lithospheric dynamics. In J. M. Kunkel, T. Ludwig, & H. W. Meuer (Eds.), Supercomputing (SC14), New Orleans (pp. 1-11). doi:10.1109/SC.2014.28.
[87]	May, DA; Brown, J.; Le Pourhiet, L., A scalable, matrix-free multigrid preconditioner for finite element discretizations of heterogeneous Stokes flow, Computer Methods in Applied Mechanics and Engineering, 290, 496-523 (2015) · Zbl 1423.76259 · doi:10.1016/j.cma.2015.03.014
[88]	Moxey, D., Amici, R., & Kirby, M. (2020a). Efficient matrix-free high-order finite element evaluation for simplicial elements. SIAM Journal on Scientific Computing, 42(3), C97-C123. doi:10.1137/19m1246523. · Zbl 1440.65223
[89]	Moxey, D., Cantwell, C. D., Bao, Y., Cassinelli, A., Castiglioni, G., Chun, S., et al. (2020b). Nektar++: Enhancing the capability and application of high-fidelity spectral/hp element methods. Computer Physics Communications, 249, 107110. doi:10.1016/j.cpc.2019.107110. · Zbl 07678500
[90]	Müthing, S., Piatkowski, M., & Bastian, P. (2017). High-performance implementation of matrix-free high-order discontinuous Galerkin methods. Technical report. arXiv:1711.10885. · Zbl 1380.76044
[91]	Nguyen, N. C., Peraire, J., & Cockburn, B. (2011). High-order implicit hybridizable discontinuous Galerkin methods for acoustics and elastodynamics. Journal of Computational Physics, 230, 3695-3718. doi:10.1016/j.jcp.2011.01.035. · Zbl 1364.76093
[92]	Noventa, G., Massa, F., Bassi, F., Colombo, A., Franchina, N., & Ghidoni, A. (2016). A high-order discontinuous Galerkin solver for unsteady incompressible turbulent flows. Computers and Fluids, 139, 248-260. doi:10.1016/j.compfluid.2016.03.007. · Zbl 1390.76344
[93]	Olson, L. (2007). Algebraic multigrid preconditioning of high-order spectral elements for elliptic problems on a simplicial mesh. SIAM Journal on Scientific Computing, 29(5), 2189-2209. doi:10.1137/060663465. · Zbl 1149.65094
[94]	Oo, K. L., & Vogel, A. (2020). Accelerating geometric multigrid preconditioning with half-precision arithmetic on GPUs. Technical report. arxiv:2007.07539.
[95]	Orszag, SA, Spectral methods for problems in complex geometries, Journal of Computational Physics, 37, 70-92 (1980) · Zbl 0476.65078 · doi:10.1016/0021-9991(80)90005-4
[96]	Patera, A. T. (1984). A spectral element method for fluid dynamics: Laminar flow in a channel expansion. Journal of Computational Physics, 54(3), 468-488. doi:10.1016/0021-9991(84)90128-1. · Zbl 0535.76035
[97]	Patterson, DA; Hennessy, JL, Computer organization and design: The hardware/software interface (2013), Burlington: Morgan Kaufmann, Burlington · Zbl 0833.68020
[98]	Pazner, W. (2019). Efficient low-order refined preconditioners for high-order matrix-free continuous and discontinuous Galerkin methods. Technical report. arXiv:1908.07071. · Zbl 1452.65051
[99]	Pazner, W., & Persson, P.-O. (2018). Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods. Journal of Computational Physics, 354, 344-369. doi:10.1016/j.jcp.2017.10.030. · Zbl 1380.65067
[100]	Persson, P. O. (2013). A sparse and high-order accurate line-based discontinuous Galerkin method for unstructured meshes. Journal of Computational Physics, 233, 414-429. doi:10.1016/j.jcp.2012.09.008. · Zbl 1286.65127
[101]	Raffenetti, K., Amer, A., Oden, L., Archer, C., Bland, W., Fujita, H., et al. (2017). Why is MPI so slow?: Analyzing the fundamental limits in implementing MPI-3.1. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’17, New York, NY, USA, 2017 (pp. 62:1-62:12). ACM. doi:10.1145/3126908.3126963. ISBN 978-1-4503-5114-0.
[102]	Rathgeber, F., Ham, D. A., Mitchell, L., Lange, M., Luporini, F., McRae, A. T. T., et al. (2017). Firedrake: Automating the finite element method by composing abstractions. ACM Transactions on Mathematical Software, 43(3), 24:1-24:27. doi:10.1145/2998441. · Zbl 1396.65144
[103]	Remacle, J.-F., Gandham, R., & Warburton, T. (2016). GPU accelerated spectral finite elements on all-hex meshes. Journal of Computational Physics, 324, 246-257. doi:10.1016/j.jcp.2016.08.005. · Zbl 1360.65283
[104]	Ruge, J. W.,& Stüben, K. (1987). Algebraic multigrid (AMG). In Multigrid methods (pp. 73-130). Philadelphia: Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611971057.ch4.
[105]	Saad, Y., Iterative methods for sparse linear systems (2003), Philadelphia: SIAM, Philadelphia · Zbl 1002.65042 · doi:10.1137/1.9780898718003
[106]	Schenk, O., & Gärtner, K. (2004). Solving unsymmetric sparse systems of linear equations with PARDISO. Future Generation Computer Systems, 20(3), 475-487. doi:10.1016/j.future.2003.07.011, https://www.pardiso-project.org/. · Zbl 1062.65035
[107]	Schöberl, J. (2014). C++11 implementation of finite elements in NGSolve. Technical report ASC Report No. 30/2014, Vienna University of Technology.
[108]	Schoeder, S., Kormann, K., Wall, W. A., & Kronbichler, M. (2018a). Efficient explicit time stepping of high order discontinuous Galerkin schemes for waves. SIAM Journal on Scientific Computing, 40(6), C803-C826. doi:10.1137/18M1185399. · Zbl 1414.65021
[109]	Schoeder, S., Kronbichler, M., & Wall, W. A. (2018b). Arbitrary high-order explicit hybridizable discontinuous Galerkin methods for the acoustic wave equation. Journal of Scientific Computing, 76, 969-1006. doi:10.1007/s10915-018-0649-2. · Zbl 1397.65163
[110]	Schoeder, S., Wall, W. A., & Kronbichler, M. (2019). ExWave: A high performance discontinuous Galerkin solver for the acoustic wave equation. SoftwareX, 9, 49-54. doi:10.1016/j.softx.2019.01.001.
[111]	Solomonoff, A. (1992). A fast algorithm for spectral differentiation. Journal of Computational Physics, 98(1), 174-177. doi:10.1016/0021-9991(92)90182-X. · Zbl 0747.65011
[112]	Stanglmeier, M., Nguyen, N. C., Peraire, J., & Cockburn, B. (2016). An explicit hybridizable discontinuous Galerkin method for the acoustic wave equation. Computer Methods in Applied Mechanics and Engineering, 300, 748-769. doi:10.1016/j.cma.2015.12.003. · Zbl 1423.76280
[113]	Sun, T., Mitchell, L., Kulkarni, K., Klöckner, A., Ham, D. A., & Kelly, P. H. J. (2020). A study of vectorization for matrix-free finite element methods. International Journal of High Performance Computing Applications, page in press. doi:10.1177/1094342020945005.
[114]	Świrydowicz, K., Chalmers, N., Karakus, A., & Warburton, T. (2019). Acceleration of tensor-product operations for high-order finite element methods. International Journal of High Performance Computing Applications, 33(4), 735-757. doi:10.1177/1094342018816368.
[115]	Thomas, J. L., Diskin, B., & Brandt, A. (2003). Textbook multigrid efficiency for fluid simulations. Annual Reviews of Fluid Mechanics, 35, 317-340. doi:10.1146/annurev.fluid.35.101101.161209. · Zbl 1041.76060
[116]	Treibig, J., & Hager, G. (2010). Introducing a performance model for bandwidth-limited loop kernels. In R. Wyrzykowski, J. Dongarra, K. Karczewski, & J. Wasniewski (Eds.), Parallel Processing and Applied Mathematics: 8th International Conference, PPAM 2009, Wroclaw, Poland, 13-16 September 2009. Revised Selected Papers, Part I (pp. 615-624). Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-14390-8_64.
[117]	Treibig, J., Hager, G.,& Wellein, G. (2020). LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego, CA. doi:10.1109/ICPPW.2010.38, https://github.com/RRZE-HPC/likwid. Retrieved 27 July 2020.
[118]	Trottenberg, U., Oosterlee, C., & Schüller, A. (2001). Multigrid. London: Elsevier/Academic. · Zbl 0976.65106
[119]	Vaněk, P., Mandel, J., & Brezina, M. (1996). Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems. Computing, 56(3), 179-196. doi:10.1007/bf02238511. · Zbl 0851.65087
[120]	Wang, Z. J., Fidkowski, K., Abgrall, R., Bassi, F., Caraeni, D., Cary, A., et al. (2013). High-order CFD methods: Current status and perspective. International Journal for Numerical Methods in Fluids, 72(8), 811-845. doi:10.1002/fld.3767. · Zbl 1455.76007
[121]	Weinzierl, T. (2019). The Peano software—parallel, automaton-based, dynamically adaptive grid traversals. ACM Transactions on Mathematical Software, 45(2), 14:1-14:41. doi:10.1145/3319797. · Zbl 1471.65213
[122]	Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52(4), 65-76. doi:10.1145/1498765.1498785.
[123]	Winters, A. R., Moura, R. C., Mengaldo, G., Gassner, G. J., Walch, S., Peiró, J., et al. (2018). A comparative study on polynomial dealiasing and split form discontinuous Galerkin schemes for under-resolved turbulence computations. Journal of Computational Physics, 372, 1-21. doi:10.1016/j.jcp.2018.06.016. · Zbl 1415.76461
[124]	Witte, J., Arndt, D., & Kanschat, G. (2019). Fast tensor product Schwarz smoothers for high-order discontinuous Galerkin methods. Technical report. arXiv:1910.11239. · Zbl 1473.65344
[125]	Yan, Z.-G., Pan, Y., Castiglioni, G., Hillewaert, K., Peiró, J., Moxey, D., et al. (2020). Nektar++: Design and implementation of an implicit, spectral/hp element, compressible flow solver using a Jacobian-free Newton Krylov approach. Computers and Mathematics with Applications, in press. doi:10.1016/j.camwa.2020.03.009. · Zbl 1456.76087

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.