-
Challenges and opportunities integrating LLAMA into AdePT
Authors:
Bernhard Manfred Gruber,
Guilherme Amadio,
Stephan Hageböck
Abstract:
Particle transport simulations are a cornerstone of high-energy physics (HEP), constituting a substantial part of the computing workload performed in HEP. To boost the simulation throughput and energy efficiency, GPUs as accelerators have been explored in recent years, further driven by the increasing use of GPUs on HPCs. The Accelerated demonstrator of electromagnetic Particle Transport (AdePT) i…
▽ More
Particle transport simulations are a cornerstone of high-energy physics (HEP), constituting a substantial part of the computing workload performed in HEP. To boost the simulation throughput and energy efficiency, GPUs as accelerators have been explored in recent years, further driven by the increasing use of GPUs on HPCs. The Accelerated demonstrator of electromagnetic Particle Transport (AdePT) is an advanced prototype for offloading the simulation of electromagnetic showers in Geant4 to GPUs, and still undergoes continuous development and optimization. Improving memory layout and data access is vital to use modern, massively parallel GPU hardware efficiently, contributing to the challenge of migrating traditional CPU based data structures to GPUs in AdePT. The low-level abstraction of memory access (LLAMA) is a C++ library that provides a zero-runtime-overhead data structure abstraction layer, focusing on multidimensional arrays of nested, structured data. It provides a framework for defining and switching custom memory mappings at compile time to define data layouts and instrument data access, making LLAMA an ideal tool to tackle the memory-related optimization challenges in AdePT. Our contribution shares insights gained with LLAMA when instrumenting data access inside AdePT, complementing traditional GPU profiler outputs. We demonstrate traces of read/write counts to data structure elements as well as memory heatmaps. The acquired knowledge allowed for subsequent data layout optimizations.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Offloading electromagnetic shower transport to GPUs
Authors:
G. Amadio,
J. Apostolakis,
P. Buncic,
G. Cosmo,
D. Dosaru,
A. Gheata,
S. Hageboeck,
J. Hahnfeld,
M. Hodgkinson,
B. Morgan,
M. Novak,
A. A. Petre,
W. Pokorski,
A. Ribon,
G. A. Stewart,
P. M. Vila
Abstract:
Making general particle transport simulation for high-energy physics (HEP) single-instruction-multiple-thread (SIMT) friendly, to take advantage of accelerator hardware, is an important alternative for boosting the throughput of simulation applications. To date, this challenge is not yet resolved, due to difficulties in mapping the complexity of Geant4 components and workflow to the massive parall…
▽ More
Making general particle transport simulation for high-energy physics (HEP) single-instruction-multiple-thread (SIMT) friendly, to take advantage of accelerator hardware, is an important alternative for boosting the throughput of simulation applications. To date, this challenge is not yet resolved, due to difficulties in mapping the complexity of Geant4 components and workflow to the massive parallelism features exposed by graphics processing units (GPU). The AdePT project is one of the R\&D initiatives tackling this limitation and exploring GPUs as potential accelerators for offloading some part of the CPU simulation workload. Our main target is to implement a complete electromagnetic shower demonstrator working on the GPU. The project is the first to create a full prototype of a realistic electron, positron, and gamma electromagnetic shower simulation on GPU, implemented as either a standalone application or as an extension of the standard Geant4 CPU workflow. Our prototype currently provides a platform to explore many optimisations and different approaches. We present the most recent results and initial conclusions of our work, using both a standalone GPU performance analysis and a first implementation of a hybrid workflow based on Geant4 on the CPU and AdePT on the GPU.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.
-
LLAMA: The Low-Level Abstraction For Memory Access
Authors:
Bernhard Manfred Gruber,
Guilherme Amadio,
Jakob Blomer,
Alexander Matthes,
René Widera,
Michael Bussmann
Abstract:
The performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program. This can be accomplished v…
▽ More
The performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous hardware architectures, the choice of the memory layout for data structures is ideally decoupled from the rest of a program. This can be accomplished via a zero-runtime-overhead abstraction layer, underneath which memory layouts can be freely exchanged.
We present the Low-Level Abstraction of Memory Access (LLAMA), a C++ library that provides such a data structure abstraction layer with example implementations for multidimensional arrays of nested, structured data. LLAMA provides fully C++ compliant methods for defining and switching custom memory layouts for user-defined data types. The library is extensible with third-party allocators.
Providing two close-to-life examples, we show that the LLAMA-generated AoS (Array of Structs) and SoA (Struct of Arrays) layouts produce identical code with the same performance characteristics as manually written data structures. Integrations into the SPEC CPU\textsuperscript{\textregistered} lbm benchmark and the particle-in-cell simulation PIConGPU demonstrate LLAMA's abilities in real-world applications. LLAMA's layout-aware copy routines can significantly speed up transfer and reshuffling of data between layouts compared with naive element-wise copying.
LLAMA provides a novel tool for the development of high-performance C++ applications in a heterogeneous environment.
△ Less
Submitted 9 March, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP
Authors:
G. Amadio,
A. Ananya,
J. Apostolakis,
M. Bandieramonte,
S. Banerjee,
A. Bhattacharyya,
C. Bianchini,
G. Bitzes,
P. Canal,
F. Carminati,
O. Chaparro-Amaro,
G. Cosmo,
J. C. De Fine Licht,
V. Drogan,
L. Duhem,
D. Elvira,
J. Fuentes,
A. Gheata,
M. Gheata,
M. Gravey,
I. Goulas,
F. Hariri,
S. Y. Jun,
D. Konstantinov,
H. Kumawat
, et al. (17 additional authors not shown)
Abstract:
Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases,…
▽ More
Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases, covering a larger fraction of the simulation budget, is only part of the solution due to intrinsic precision limitations. The remainder corresponds to speeding-up the simulation software by several factors, which is out of reach using simple optimizations on the current code base. In this context, the GeantV R&D project was launched, aiming to redesign the legacy particle transport codes in order to make them benefit from fine-grained parallelism features such as vectorization, but also from increased code and data locality. This paper presents extensively the results and achievements of this R&D, as well as the conclusions and lessons learnt from the beta prototype.
△ Less
Submitted 16 September, 2020; v1 submitted 2 May, 2020;
originally announced May 2020.
-
Software Challenges For HL-LHC Data Analysis
Authors:
ROOT Team,
Kim Albertsson Brann,
Guilherme Amadio,
Sitong An,
Bertrand Bellenot,
Jakob Blomer,
Philippe Canal,
Olivier Couet,
Massimiliano Galli,
Enrico Guiraud,
Stephan Hageboeck,
Sergey Linev,
Pere Mato Vila,
Lorenzo Moneta,
Axel Naumann,
Alja Mrak Tadel,
Vincenzo Eduardo Padulano,
Fons Rademakers,
Oksana Shadura,
Matevz Tadel,
Enric Tejedor Saavedra,
Xavier Valls Pla,
Vassil Vassilev,
Stefan Wunsch
Abstract:
The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data ana…
▽ More
The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data analysis software, for making best use of HL-LHC's physics potential. This work shows what these areas could be, why the ROOT team believes investing in them is needed, which gains are expected, and where related work is ongoing. It can serve as an indication for future research proposals and cooperations.
△ Less
Submitted 4 May, 2020; v1 submitted 16 April, 2020;
originally announced April 2020.
-
Increasing Parallelism in the ROOT I/O Subsystem
Authors:
Guilherme Amadio,
Brian Bockelman,
Philippe Canal,
Danilo Piparo,
Enric Tejedor,
Zhe Zhang
Abstract:
When processing large amounts of data, the rate at which reading and writing can take place is a critical factor. High energy physics data processing relying on ROOT is no exception. The recent parallelisation of LHC experiments' software frameworks and the analysis of the ever increasing amount of collision data collected by experiments further emphasized this issue underlying the need of increas…
▽ More
When processing large amounts of data, the rate at which reading and writing can take place is a critical factor. High energy physics data processing relying on ROOT is no exception. The recent parallelisation of LHC experiments' software frameworks and the analysis of the ever increasing amount of collision data collected by experiments further emphasized this issue underlying the need of increasing the implicit parallelism expressed within the ROOT I/O. In this contribution we highlight the improvements of the ROOT I/O subsystem which targeted a satisfactory scaling behaviour in a multithreaded context. The effect of parallelism on the individual steps which are chained by ROOT to read and write data, namely (de)compression, (de)serialisation, access to storage backend, are discussed. Performance measurements are discussed through real life examples coming from CMS production workflows on traditional server platforms and highly parallel architectures such as Intel Xeon Phi.
△ Less
Submitted 9 April, 2018;
originally announced April 2018.
-
A Roadmap for HEP Software and Computing R&D for the 2020s
Authors:
Johannes Albrecht,
Antonio Augusto Alves Jr,
Guilherme Amadio,
Giuseppe Andronico,
Nguyen Anh-Ky,
Laurent Aphecetche,
John Apostolakis,
Makoto Asai,
Luca Atzori,
Marian Babik,
Giuseppe Bagliesi,
Marilena Bandieramonte,
Sunanda Banerjee,
Martin Barisits,
Lothar A. T. Bauerdick,
Stefano Belforte,
Douglas Benjamin,
Catrin Bernius,
Wahid Bhimji,
Riccardo Maria Bianchi,
Ian Bird,
Catherine Biscarat,
Jakob Blomer,
Kenneth Bloom,
Tommaso Boccali
, et al. (285 additional authors not shown)
Abstract:
Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for…
▽ More
Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.
△ Less
Submitted 19 December, 2018; v1 submitted 18 December, 2017;
originally announced December 2017.
-
Portage: Bringing Hackers' Wisdom to Science
Authors:
Guilherme Amadio,
Benda Xu
Abstract:
Providing users of HPC systems with a wide variety of up to date software packages is a challenging task. Large software stacks built from source are difficult to manage, requiring powerful package management tools. The Portage package manager from Gentoo is a highly flexible tool that offers a mature solution to this otherwise daunting task. The Gentoo Prefix project develops and maintains a way…
▽ More
Providing users of HPC systems with a wide variety of up to date software packages is a challenging task. Large software stacks built from source are difficult to manage, requiring powerful package management tools. The Portage package manager from Gentoo is a highly flexible tool that offers a mature solution to this otherwise daunting task. The Gentoo Prefix project develops and maintains a way of installing Gentoo systems in non-standard locations, bringing the virtues of Gentoo to other operating systems. Here we demonstrate how a Gentoo Prefix installation can be used to cross compile software packages for the Intel Xeon Phi known as Knights Corner, as well as to manage large software stacks in HPC environments.
△ Less
Submitted 9 October, 2016;
originally announced October 2016.
-
Modeling of random bimodal structures of composites (application to solid propellants): I. Simulation of random packs
Authors:
V. A. Buryachenko,
T. L. Jackson,
G. Amadio
Abstract:
We consider a composite medium, which consists of a homogeneous matrix containing a statistically homogeneous set of multimodal spherical inclusions. This model is used to represent the morphology of heterogeneous solid propellants (HSP) that are widely used in the rocket industry. The Lubachevsky-Stillinger algorithm is used to generate morphological models of HSP with large polydisperse packs of…
▽ More
We consider a composite medium, which consists of a homogeneous matrix containing a statistically homogeneous set of multimodal spherical inclusions. This model is used to represent the morphology of heterogeneous solid propellants (HSP) that are widely used in the rocket industry. The Lubachevsky-Stillinger algorithm is used to generate morphological models of HSP with large polydisperse packs of spherical inclusions. We modify the algorithm by proposing a random shaking procedure that leads to the stabilization of a statistical distribution of the simulated structure that is homogeneous, highly mixed, and protocol independent (in sense that the statistical parameters estimated do not depend on the basic simulation algorithm). Increasing the number of shaking has a twofold effect. First, the system becomes more homogeneous and well-mixed. Second, the stochastic fluctuations of statistical parameters (such as e.g. radial distribution function, RDF), estimated by averaging of these structures, tend to diminish.
△ Less
Submitted 20 September, 2012; v1 submitted 31 July, 2012;
originally announced July 2012.
-
Low-lying non-normal parity states in 8B measured by proton elastic scattering on 7Be
Authors:
H. Yamaguchi,
Y. Wakabayashi,
G. Amadio,
H. Fujikawa,
T. Teranishi,
A. Saito,
J. J. He,
S. Nishimura,
Y. Togano,
Y. K. Kwon,
M. Niikura,
N. Iwasa,
K. Inafuku,
L. H. Khiem
Abstract:
A new measurement of proton resonance scattering on 7Be was performed up to the center-of-mass energy of 6.7 MeV using the low-energy RI beam facility CRIB (CNS Radioactive Ion Beam separator) at the Center for Nuclear Study of the University of Tokyo. The excitation function of 7Be+p elastic scattering above 3.5 MeV was measured successfully for the first time, providing important information a…
▽ More
A new measurement of proton resonance scattering on 7Be was performed up to the center-of-mass energy of 6.7 MeV using the low-energy RI beam facility CRIB (CNS Radioactive Ion Beam separator) at the Center for Nuclear Study of the University of Tokyo. The excitation function of 7Be+p elastic scattering above 3.5 MeV was measured successfully for the first time, providing important information about the resonance structure of the 8B nucleus. The resonances are related to the reaction rate of 7Be(p, gamma)8B, which is the key reaction in solar 8B neutrino production. Evidence for the presence of two negative parity states is presented. One of them is a 2- state observed as a broad s-wave resonance, the existence of which had been questionable. Its possible effects on the determination of the astrophysical S-factor of 7Be(p, gamma)8B at solar energy are discussed. The other state had not been observed in previous measurements, and its spin and parity were determined as 1-.
△ Less
Submitted 19 October, 2008;
originally announced October 2008.