×

A performance portable implementation of the semi-Lagrangian algorithm in six dimensions. (English) Zbl 07788808

Summary: This paper describes our approach to developing a simulation software application for the fully kinetic 6D-Vlasov equation, which will be used to explore physics beyond the reduced gyrokinetic model. Simulating the fully kinetic Vlasov equation requires efficient utilization of compute and storage capabilities due to the high dimensionality of the problem. In addition, the implementation needs to be extensible regarding the physical model and flexible regarding the hardware for production runs. We start on the algorithmic background to simulate the 6-D Vlasov equation using a semi-Lagrangian algorithm. The performance portable software stack, which enables production runs on pure CPU as well as AMD or Nvidia GPU accelerated nodes, is presented. The extensibility of our implementation is guaranteed through the described software architecture of the main kernel, which achieves a memory bandwidth of almost 500 GB/s on a V100 Nvidia GPU and around 100 GB/s on an Intel Xeon Gold CPU using a single code base. We provide performance data on multiple node-level architectures discussing utilized and further available hardware capabilities. Finally, the network communication bottleneck of 6-D grid-based algorithms is quantified. A verification of physics beyond gyrokinetic theory, for the example of ion Bernstein waves, concludes the work.

MSC:

68-XX Computer science
65-XX Numerical analysis

References:

[1] BSL6D Authors, BSL6D - Backwards SemiLagrangian 6Dimensions (2023)
[2] Raeth, M., Beyond Gyrokinetic Theory (2023), Technical University Munich: Technical University Munich Munich
[3] Trott, C. R.; Lebrun-Grandie, D.; Arndt, D.; Ciesko, J.; Dang, V.; Ellingwood, N.; Gayatri, R.; Harvey, E.; Hollman, D.; Ibanez, D.; Liber, N.; Madsen, J.; Miles, J.; Poliakoff, D.; Powell, A.; Rajamanickam, S.; Simberg, M.; Sunderland, D.; Turcksin, B.; Wilke, J., Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst., 805-817 (2022)
[4] Kormann, K.; Reuter, K.; Rampp, M., A massively parallel semi-Lagrangian solver for the six-dimensional Vlasov-Poisson equation. Int. J. HPC Appl. (2019)
[5] Gamma, E.; Helm, R.; Johnson, R.; Vlissides, J., Design Patterns. Addison Wesley Professional Computing Series (2007), Addison-Wesley: Addison-Wesley Boston
[6] Germaschewski, K.; Allen, B.; Dannert, T.; Hrywniak, M.; Donaghy, J.; Merlo, G.; Ethier, S.; D’Azevedo, E.; Jenko, F.; Bhattacharjee, A., Toward exascale whole-device modeling of fusion devices: porting the GENE gyrokinetic microturbulence code to GPU. Phys. Plasmas (2021)
[7] Zhang, W.; Almgren, A.; Beckner, V.; Bell, J.; Blaschke, J.; Chan, C.; Day, M.; Friesen, B.; Gott, K.; Graves, D.; Katz, M.; Myers, A.; Nguyen, T.; Nonaka, A.; Rosso, M.; Williams, S.; Zingale, M., AMReX: a framework for block-structured adaptive mesh refinement. J. Open Sour. Softw., 1370 (2019)
[8] Ohana, N.; Gheller, C.; Lanti, E.; Jocksch, A.; Brunner, S.; Villard, L., Gyrokinetic simulations on many- and multi-core architectures with the global electromagnetic Particle-In-Cell Code ORB5. Comput. Phys. Commun. (2021)
[9] ECP-CoPa, CoPA cabana - the exascale co-design center for particle applications toolkit (2023)
[10] Muralikrishnan, S.; Frey, M.; Vinciguerra, A.; Ligotino, M.; Cerfon, A. J.; Stoyanov, M. K.; Gayatri, R.; Adelmann, A., Scaling and performance portability of the particle-in-cell scheme for plasma physics applications through mini-apps targeting exascale architectures (2022)
[11] Beckingsale, D. A.; Burmark, J.; Hornung, R.; Jones, H.; Killian, W.; Kunen, A. J.; Pearce, O.; Robinson, P.; Ryujin, B. S.; Scogland, T. R., RAJA: portable performance for large-scale scientific applications, 71-81
[12] Artigues, V.; Kormann, K.; Rampp, M.; Reuter, K., Evaluation of performance portability frameworks for the implementation of a particle-in-cell code. Conn. Comput. Pract. Exp., 1-23 (2020)
[13] Matthes, A.; Widera, R.; Zenker, E.; Worpitz, B.; Huebl, A.; Bussmann, M., Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the alpaka library
[14] Bigot, J.; Grandgirard, V.; Latu, G.; Passeron, C.; Rozar, F.; Thomine, O., Scaling GYSELA code beyond 32K-cores on blue gene. ESAIM Proc., 117-135 (2013)
[15] Umeda, T.; Fukazawa, K., Performance tuning of Vlasov code for space plasma on the K computer, 127-138
[16] F. Allmann-Rahn, S. Lautenbach, M. Deisenhofer, R. Grauer, The muphyII Code: Multiphysics Plasma Simulation on Large HPC Systems, ArXiv (2023).
[17] Asahi, Y.; Latu, G.; Grandgirard, V.; Bigot, J., Performance portable implementation of a kinetic plasma simulation mini-app, 117-139
[18] Einkemmer, L., Semi-Lagrangian Vlasov simulation on GPUs. Comput. Phys. Commun. (2020) · Zbl 1535.82032
[19] Einkemmer, L.; Moriggl, A., Semi-Lagrangian 4d, 5d, and 6d kinetic plasma simulation on large-scale GPU-equipped supercomputers. Int. J. HPC Appl. (2022)
[20] Fedeli, L.; Huebl, A.; Boillod-Cerneux, F.; Clark, T.; Gott, K.; Hillairet, C.; Jaure, S.; Leblanc, A.; Lehe, R.; Myers, A.; Piechurski, C.; Sato, M.; Zaim, N.; Zhang, W.; Vay, J.-L.; Vincenti, H., Pushing the frontier in the design of laser-based electron accelerators with groundbreaking mesh-refined particle-in-cell simulations on exascale-class supercomputers, 1-12
[21] Bussmann, M.; Burau, H.; Cowan, T. E.; Debus, A.; Huebl, A.; Juckeland, G.; Kluge, T.; Nagel, W. E.; Pausch, R.; Schmitt, F.; Schramm, U.; Schuchart, J.; Widera, R., Radiative signatures of the relativistic Kelvin-Helmholtz instability, 5:1-5:12
[22] Markidis, S.; Lapenta, G., Multi-scale simulations of plasma with iPIC3D. Math. Comput. Simul., 1509-1519 (2010) · Zbl 1195.82086
[23] (Lifshitz, E.; Pitaevskij, L., Band 10: Course of Theoretical Physics (1981), Pergamon Press: Pergamon Press Oxford)
[24] Cheng, C.; Knorr, G., The integration of the Vlasov equation in configuration space. J. Comput. Phys., 330-351 (1976)
[25] McLachlan, R. I.; Quispel, G. R.W., Splitting methods. Acta Numer., 341-434 (2002) · Zbl 1105.65341
[26] Crouseilles, N.; Mehrenberger, M.; Vecil, F., Discontinuous Galerkin semi-Lagrangian method for Vlasov-Poisson. ESAIM Proc., 211-230 (2011) · Zbl 1302.76099
[27] Snir, M.; Otto, S.; Huss-Lederman, S.; Walker, D., MPI (1998), MIT Press: MIT Press Cambridge, Mass. [u.a.]
[28] Hierarchical Data Format, version 5 (1997-NNNN)
[29] Ayala, A.; Tomov, S.; Haidar, A.; Dongarra, J., heFFTe: highly efficient FFT for exascale, 262-275
[30] Trott, C. R.; Berger-Vergiat, L.; Poliakoff Sivasankaran, D.; Lebrun-Grandie, D.; Madsen, J.; Awar, N. A.; Gligoric, M.; Shipman, G.; Womeldorff, G., The Kokkos EcoSystem: comprehensive performance portability for high performance computing. Comput. Sci. Eng., 10-18 (2021)
[31] Kokkos documentation (2023)
[32] Edwards, H. C.; Trott, C. R.; Sunderland, D., Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput., 3202-3216 (2014)
[33] Harris, M., An efficient matrix transpose in CUDA C/C++ (2023)
[34] (2023), W. Language, System, Array
[35] (2023), Numpy, numpy.array
[36] Hager, G.; Wellein, G., Introduction to high performance computing for scientists and engineers
[37] Datta, K.; Kamil, S.; Williams, S.; Oliker, L.; Shalf, J.; Yelick, K., Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev., 129-159 (2009) · Zbl 1160.65359
[38] Gregoire, M., Professional C++ (2021), Wiley: Wiley Indianapolis, Indiana
[39] Treibig, J.; Hager, G.; Wellein, G., LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments
[40] Nvidia, Nsight compute documentation (2023)
[41] Amd, Rocm profiling tools user guide (2023)
[42] Deakin, T.; Price, J.; Martineau, M.; McIntosh-Smith, S., Evaluating attainable memory bandwidth of parallel programming models via babelstream. Int. J. Comput. Sci. Eng., 247-262 (2018)
[43] Williams, S.; Waterman, A.; Patterson, D., Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 65-76 (2009)
[44] Bernstein, I. B., Waves in a plasma in a magnetic field. Phys. Rev., 10-21 (1958) · Zbl 0079.44102
[45] Sturdevant, B. J.; Chen, Y.; Parker, S. E., Low frequency fully kinetic simulation of the toroidal ion temperature gradient instability. Phys. Plasmas (2017)
[46] Hazeltine, R. D.; Waelbroeck, F. L., The Framework of Plasma Physics (2018), CRC Press
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.