-
Parallelize Over Data Particle Advection: Participation, Ping Pong Particles, and Overhead
Authors:
Zhe Wang,
Kenneth Moreland,
Matthew Larsen,
James Kress,
Hank Childs,
David Pugmire
Abstract:
Particle advection is one of the foundational algorithms for visualization and analysis and is central to understanding vector fields common to scientific simulations. Achieving efficient performance with large data in a distributed memory setting is notoriously difficult. Because of its simplicity and minimized movement of large vector field data, the Parallelize over Data (POD) algorithm has bec…
▽ More
Particle advection is one of the foundational algorithms for visualization and analysis and is central to understanding vector fields common to scientific simulations. Achieving efficient performance with large data in a distributed memory setting is notoriously difficult. Because of its simplicity and minimized movement of large vector field data, the Parallelize over Data (POD) algorithm has become a de facto standard. Despite its simplicity and ubiquitous usage, the scaling issues with the POD algorithm are known and have been described throughout the literature. In this paper, we describe a set of in-depth analyses of the POD algorithm that shed new light on the underlying causes for the poor performance of this algorithm. We designed a series of representative workloads to study the performance of the POD algorithm and executed them on a supercomputer while collecting timing and statistical data for analysis. We then performed two different types of analysis. In the first analysis, we introduce two novel metrics for measuring algorithmic efficiency over the course of a workload run. The second analysis was from the perspective of the particles being advected. Using particle centric analysis, we identify that the overheads associated with particle movement between processes (not the communication itself) have a dramatic impact on the overall execution time. In the first analysis, we introduce two novel metrics for measuring algorithmic efficiency over the course of a workload run. The second analysis was from the perspective of the particles being advected. Using particle-centric analysis, we identify that the overheads associated with particle movement between processes have a dramatic impact on the overall execution time. These overheads become particularly costly when flow features span multiple blocks, resulting in repeated particle circulation between blocks.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
An Entropy-Based Test and Development Framework for Uncertainty Modeling in Level-Set Visualizations
Authors:
Robert Sisneros,
Tushar M. Athawale,
David Pugmire,
Kenneth Moreland
Abstract:
We present a simple comparative framework for testing and developing uncertainty modeling in uncertain marching cubes implementations. The selection of a model to represent the probability distribution of uncertain values directly influences the memory use, run time, and accuracy of an uncertainty visualization algorithm. We use an entropy calculation directly on ensemble data to establish an expe…
▽ More
We present a simple comparative framework for testing and developing uncertainty modeling in uncertain marching cubes implementations. The selection of a model to represent the probability distribution of uncertain values directly influences the memory use, run time, and accuracy of an uncertainty visualization algorithm. We use an entropy calculation directly on ensemble data to establish an expected result and then compare the entropy from various probability models, including uniform, Gaussian, histogram, and quantile models. Our results verify that models matching the distribution of the ensemble indeed match the entropy. We further show that fewer bins in nonparametric histogram models are more effective whereas large numbers of bins in quantile models approach data accuracy.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Uncertainty Visualization of Critical Points of 2D Scalar Fields for Parametric and Nonparametric Probabilistic Models
Authors:
Tushar M. Athawale,
Zhe Wang,
David Pugmire,
Kenneth Moreland,
Qian Gong,
Scott Klasky,
Chris R. Johnson,
Paul Rosen
Abstract:
This paper presents a novel end-to-end framework for closed-form computation and visualization of critical point uncertainty in 2D uncertain scalar fields. Critical points are fundamental topological descriptors used in the visualization and analysis of scalar fields. The uncertainty inherent in data (e.g., observational and experimental data, approximations in simulations, and compression), howev…
▽ More
This paper presents a novel end-to-end framework for closed-form computation and visualization of critical point uncertainty in 2D uncertain scalar fields. Critical points are fundamental topological descriptors used in the visualization and analysis of scalar fields. The uncertainty inherent in data (e.g., observational and experimental data, approximations in simulations, and compression), however, creates uncertainty regarding critical point positions. Uncertainty in critical point positions, therefore, cannot be ignored, given their impact on downstream data analysis tasks. In this work, we study uncertainty in critical points as a function of uncertainty in data modeled with probability distributions. Although Monte Carlo (MC) sampling techniques have been used in prior studies to quantify critical point uncertainty, they are often expensive and are infrequently used in production-quality visualization software. We, therefore, propose a new end-to-end framework to address these challenges that comprises a threefold contribution. First, we derive the critical point uncertainty in closed form, which is more accurate and efficient than the conventional MC sampling methods. Specifically, we provide the closed-form and semianalytical (a mix of closed-form and MC methods) solutions for parametric (e.g., uniform, Epanechnikov) and nonparametric models (e.g., histograms) with finite support. Second, we accelerate critical point probability computations using a parallel implementation with the VTK-m library, which is platform portable. Finally, we demonstrate the integration of our implementation with the ParaView software system to demonstrate near-real-time results for real datasets.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
MEMPSEP III. A machine learning-oriented multivariate data set for forecasting the Occurrence and Properties of Solar Energetic Particle Events using a Multivariate Ensemble Approach
Authors:
Kimberly Moreland,
Maher Dayeh,
Hazel M. Bain,
Subhamoy Chatterjee,
Andres Munoz-Jaramillo,
Samuel Hart
Abstract:
We introduce a new multivariate data set that utilizes multiple spacecraft collecting in-situ and remote sensing heliospheric measurements shown to be linked to physical processes responsible for generating solar energetic particles (SEPs). Using the Geostationary Operational Environmental Satellites (GOES) flare event list from Solar Cycle (SC) 23 and part of SC 24 (1998-2013), we identify 252 so…
▽ More
We introduce a new multivariate data set that utilizes multiple spacecraft collecting in-situ and remote sensing heliospheric measurements shown to be linked to physical processes responsible for generating solar energetic particles (SEPs). Using the Geostationary Operational Environmental Satellites (GOES) flare event list from Solar Cycle (SC) 23 and part of SC 24 (1998-2013), we identify 252 solar events (flares) that produce SEPs and 17,542 events that do not. For each identified event, we acquire the local plasma properties at 1 au, such as energetic proton and electron data, upstream solar wind conditions, and the interplanetary magnetic field vector quantities using various instruments onboard GOES and the Advanced Composition Explorer (ACE) spacecraft. We also collect remote sensing data from instruments onboard the Solar Dynamic Observatory (SDO), Solar and Heliospheric Observatory (SoHO), and the Wind solar radio instrument WAVES. The data set is designed to allow for variations of the inputs and feature sets for machine learning (ML) in heliophysics and has a specific purpose for forecasting the occurrence of SEP events and their subsequent properties. This paper describes a dataset created from multiple publicly available observation sources that is validated, cleaned, and carefully curated for our machine-learning pipeline. The dataset has been used to drive the newly-developed Multivariate Ensemble of Models for Probabilistic Forecast of Solar Energetic Particles (MEMPSEP; see MEMPSEP I (Chatterjee et al., 2023) and MEMPSEP II (Dayeh et al., 2023) for associated papers).
△ Less
Submitted 26 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
MEMPSEP I : Forecasting the Probability of Solar Energetic Particle Event Occurrence using a Multivariate Ensemble of Convolutional Neural Networks
Authors:
Subhamoy Chatterjee,
Maher Dayeh,
Andrés Muñoz-Jaramillo,
Hazel M. Bain,
Kimberly Moreland,
Samuel Hart
Abstract:
The Sun continuously affects the interplanetary environment through a host of interconnected and dynamic physical processes. Solar flares, Coronal Mass Ejections (CMEs), and Solar Energetic Particles (SEPs) are among the key drivers of space weather in the near-Earth environment and beyond. While some CMEs and flares are associated with intense SEPs, some show little to no SEP association. To date…
▽ More
The Sun continuously affects the interplanetary environment through a host of interconnected and dynamic physical processes. Solar flares, Coronal Mass Ejections (CMEs), and Solar Energetic Particles (SEPs) are among the key drivers of space weather in the near-Earth environment and beyond. While some CMEs and flares are associated with intense SEPs, some show little to no SEP association. To date, robust long-term (hours-days) forecasting of SEP occurrence and associated properties (e.g., onset, peak intensities) does not effectively exist and the search for such development continues. Through an Operations-2-Research support, we developed a self-contained model that utilizes a comprehensive dataset and provides a probabilistic forecast for SEP event occurrence and its properties. The model is named Multivariate Ensemble of Models for Probabilistic Forecast of Solar Energetic Particles (MEMPSEP). MEMPSEP workhorse is an ensemble of Convolutional Neural Networks that ingests a comprehensive dataset (MEMPSEP III - (Moreland et al., 2023)) of full-disc magnetogram-sequences and in-situ data from different sources to forecast the occurrence (MEMPSEP I - this work) and properties (MEMPSEP II - Dayeh et al. (2023)) of a SEP event. This work focuses on estimating true SEP occurrence probabilities achieving a 2.5% improvement in reliability and a Brier score of 0.14. The outcome provides flexibility for the end-users to determine their own acceptable level of risk, rather than imposing a detection threshold that optimizes an arbitrary binary classification metric. Furthermore, the model-ensemble, trained to utilize the large class-imbalance between events and non-events, provides a clear measure of uncertainty in our forecast
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
MEMPSEP II. -- Forecasting the Properties of Solar Energetic Particle Events using a Multivariate Ensemble Approach
Authors:
Maher A. Dayeh,
Subhamoy Chatterjee,
Andres Munoz-Jaramillo,
Kimberly Moreland,
Hazel M. Bain,
Samuel Hart
Abstract:
Solar Energetic Particles (SEPs) form a critical component of Space Weather. The complex, intertwined dynamics of SEP sources, acceleration, and transport make their forecasting very challenging. Yet, information about SEP arrival and their properties (e.g., peak flux) is crucial for space exploration on many fronts. We have recently introduced a novel probabilistic ensemble model called the Multi…
▽ More
Solar Energetic Particles (SEPs) form a critical component of Space Weather. The complex, intertwined dynamics of SEP sources, acceleration, and transport make their forecasting very challenging. Yet, information about SEP arrival and their properties (e.g., peak flux) is crucial for space exploration on many fronts. We have recently introduced a novel probabilistic ensemble model called the Multivariate Ensemble of Models for Probabilistic Forecast of Solar Energetic Particles (MEMPSEP). Its primary aim is to forecast the occurrence and physical properties of SEPs. The occurrence forecasting, thoroughly discussed in a preceding paper (Chatterjee et al., 2023), is complemented by the work presented here, which focuses on forecasting the physical properties of SEPs. The MEMPSEP model relies on an ensemble of Convolutional Neural Networks, which leverage a multi-variate dataset comprising full-disc magnetogram sequences and numerous derived and in-situ data from various sources. Skill scores demonstrate that MEMPSEP exhibits improved predictions on SEP properties for the test set data with SEP occurrence probability above 50%, compared to those with a probability below 50%. Results present a promising approach to address the challenging task of forecasting SEP physical properties, thus improving our forecasting capabilities and advancing our understanding of the dominant parameters and processes that govern SEP production.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Homogenising SoHO/EIT and SDO/AIA 171Å$~$ Images: A Deep Learning Approach
Authors:
Subhamoy Chatterjee,
Andrés Muñoz-Jaramillo,
Maher Dayeh,
Hazel M. Bain,
Kimberly Moreland
Abstract:
Extreme Ultraviolet images of the Sun are becoming an integral part of space weather prediction tasks. However, having different surveys requires the development of instrument-specific prediction algorithms. As an alternative, it is possible to combine multiple surveys to create a homogeneous dataset. In this study, we utilize the temporal overlap of SoHO/EIT and SDO/AIA 171~Å~surveys to train an…
▽ More
Extreme Ultraviolet images of the Sun are becoming an integral part of space weather prediction tasks. However, having different surveys requires the development of instrument-specific prediction algorithms. As an alternative, it is possible to combine multiple surveys to create a homogeneous dataset. In this study, we utilize the temporal overlap of SoHO/EIT and SDO/AIA 171~Å~surveys to train an ensemble of deep learning models for creating a single homogeneous survey of EUV images for 2 solar cycles. Prior applications of deep learning have focused on validating the homogeneity of the output while overlooking the systematic estimation of uncertainty. We use an approach called `Approximate Bayesian Ensembling' to generate an ensemble of models whose uncertainty mimics that of a fully Bayesian neural network at a fraction of the cost. We find that ensemble uncertainty goes down as the training set size increases. Additionally, we show that the model ensemble adds immense value to the prediction by showing higher uncertainty in test data that are not well represented in the training data.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
Variability of interplanetary shock and associated energetic particle properties as a function of the time window around the shock
Authors:
Kimberly Moreland,
Maher Dayeh,
Gang Li,
Ashraf Farahat,
Rob Ebert,
Mihir Desai
Abstract:
We study the effect of sampling windows on derived shock and associated energetic storm particle (ESP) properties in 296 fast-forward interplanetary shocks using ACE measurements at 1 au between 02/1998 - 08/2013. We vary the time windows from 2-mins to 20-mins for the shock properties and from 2-mins to 540-mins for ESP properties. Variability is quantified by the median absolute deviation (MAD)…
▽ More
We study the effect of sampling windows on derived shock and associated energetic storm particle (ESP) properties in 296 fast-forward interplanetary shocks using ACE measurements at 1 au between 02/1998 - 08/2013. We vary the time windows from 2-mins to 20-mins for the shock properties and from 2-mins to 540-mins for ESP properties. Variability is quantified by the median absolute deviation (MAD) statistic. We find that the magnetic, density, and temperature compression ratios vary from their median values by 17.03%, 20.05%, 25.91%, respectively; shock speed by 16.26%, speed jump by 45.46%, Alfvenic Mach number by 31.53%, and shock obliquity by 24.25%. Spectral indices in the 2-min to 540-min windows downstream of the shock vary from the median value of 1.79 by 26.05%, and by 30.53% from the 1.70 median value upstream of the shock. Similarity of ESP spectral indices upstream and downstream of the shock suggest that these ESP populations are likely locally accelerated at the shock. Furthermore, we find that for a moving sampling window around the shock, values for the density ratio hold for ~10-mins; the magnetic ratio and shock speed jump hold for ~30-mins, and ~60-mins, respectively. Fixing the upstream window to 2-mins and moving only in the downstream direction, then the density ratio holds for ~60-mins downstream, magnetic ratio for ~30-mins, and the shock speed jump holds for ~110-mins. Beyond these time windows, derived shock properties no longer representative of shock properties. These results provide constraints for modeling and forecasting efforts of shock and ESP-associated properties.
△ Less
Submitted 26 October, 2023; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Scalable In Situ Lagrangian Flow Map Extraction: Demonstrating the Viability of a Communication-Free Model
Authors:
Sudhanshu Sane,
Abhishek Yenpure,
Roxana Bujack,
Matthew Larsen,
Kenneth Moreland,
Christoph Garth,
Hank Childs
Abstract:
We introduce and evaluate a new algorithm for the in situ extraction of Lagrangian flow maps, which we call Boundary Termination Optimization (BTO). Our approach is a communication-free model, requiring no message passing or synchronization between processes, improving scalability, thereby reducing overall execution time and alleviating the encumbrance placed on simulation codes from in situ proce…
▽ More
We introduce and evaluate a new algorithm for the in situ extraction of Lagrangian flow maps, which we call Boundary Termination Optimization (BTO). Our approach is a communication-free model, requiring no message passing or synchronization between processes, improving scalability, thereby reducing overall execution time and alleviating the encumbrance placed on simulation codes from in situ processing. We terminate particle integration at node boundaries and store only a subset of the flow map that would have been extracted by communicating particles across nodes, thus introducing an accuracy-performance tradeoff. We run experiments with as many as 2048 GPUs and with multiple simulation data sets. For the experiment configurations we consider, our findings demonstrate that our communication-free technique saves as much as 2x to 4x in execution time in situ, while staying nearly as accurate quantitatively and qualitatively as previous work. Most significantly, this study establishes the viability of approaching in situ Lagrangian flow map extraction using communication-free models in the future.
△ Less
Submitted 4 April, 2020;
originally announced April 2020.