-
Navigating Governance Paradigms: A Cross-Regional Comparative Study of Generative AI Governance Processes & Principles
Authors:
Jose Luna,
Ivan Tan,
Xiaofei Xie,
Lingxiao Jiang
Abstract:
As Generative Artificial Intelligence (GenAI) technologies evolve at an unprecedented rate, global governance approaches struggle to keep pace with the technology, highlighting a critical issue in the governance adaptation of significant challenges. Depicting the nuances of nascent and diverse governance approaches based on risks, rules, outcomes, principles, or a mix across different regions arou…
▽ More
As Generative Artificial Intelligence (GenAI) technologies evolve at an unprecedented rate, global governance approaches struggle to keep pace with the technology, highlighting a critical issue in the governance adaptation of significant challenges. Depicting the nuances of nascent and diverse governance approaches based on risks, rules, outcomes, principles, or a mix across different regions around the globe is fundamental to discern discrepancies and convergences and to shed light on specific limitations that need to be addressed, thereby facilitating the safe and trustworthy adoption of GenAI. In response to the need and the evolving nature of GenAI, this paper seeks to provide a collective view of different governance approaches around the world. Our research introduces a Harmonized GenAI Framework, "H-GenAIGF," based on the current governance approaches of six regions: European Union (EU), United States (US), China (CN), Canada (CA), United Kingdom (UK), and Singapore (SG). We have identified four constituents, fifteen processes, twenty-five sub-processes, and nine principles that aid the governance of GenAI, thus providing a comprehensive perspective on the current state of GenAI governance. In addition, we present a comparative analysis to facilitate the identification of common ground and distinctions based on the coverage of the processes by each region. The results show that risk-based approaches allow for better coverage of the processes, followed by mixed approaches. Other approaches lag behind, covering less than 50% of the processes. Most prominently, the analysis demonstrates that among the regions, only one process aligns across all approaches, highlighting the lack of consistent and executable provisions. Moreover, our case study on ChatGPT reveals process coverage deficiency, showing that harmonization of approaches is necessary to find alignment for GenAI governance.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments
Authors:
Maciej Besta,
Robert Gerstenberger,
Patrick Iff,
Pournima Sonawane,
Juan Gómez Luna,
Raghavendra Kanakagiri,
Rui Min,
Onur Mutlu,
Torsten Hoefler,
Raja Appuswamy,
Aidan O Mahony
Abstract:
Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in the area of the Semantic Web as well as gaining popularity in other application domains such as data mining and search engines. Simultaneously, there has been enormous progress in the development of different types of heterogeneous hardware, impacting the way KGs are processed. The aim of this paper is to p…
▽ More
Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in the area of the Semantic Web as well as gaining popularity in other application domains such as data mining and search engines. Simultaneously, there has been enormous progress in the development of different types of heterogeneous hardware, impacting the way KGs are processed. The aim of this paper is to provide a systematic literature review of knowledge graph hardware acceleration. For this, we present a classification of the primary areas in knowledge graph technology that harnesses different hardware units for accelerating certain knowledge graph functionalities. We then extensively describe respective works, focusing on how KG related schemes harness modern hardware accelerators. Based on our review, we identify various research gaps and future exploratory directions that are anticipated to be of significant value both for academics and industry practitioners.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
V407 Lup, an intermediate polar nova
Authors:
M. Orio,
M. Melicherčík,
S. Ciroi,
V. Canton,
E. Aydi,
D. A. H. Buckley,
A. Dobrotka,
G. J. M. Luna,
J. Ness
Abstract:
We present X-ray and optical observations of nova V407 Lup (Nova Lup 2016), previously well monitored in outburst, as it returned to quiescent accretion. The X-ray light curve in 2020 February revealed a clear flux modulation with a stable period of 564.64$\pm$0.64 s, corresponding to the period measured in outburst and attributed to the spin of a magnetized white dwarf in an intermediate polar (I…
▽ More
We present X-ray and optical observations of nova V407 Lup (Nova Lup 2016), previously well monitored in outburst, as it returned to quiescent accretion. The X-ray light curve in 2020 February revealed a clear flux modulation with a stable period of 564.64$\pm$0.64 s, corresponding to the period measured in outburst and attributed to the spin of a magnetized white dwarf in an intermediate polar (IP) system. This detection in quiescence is consistent with the IP classification proposed after the nova eruption. The XMM-Newton EPIC X-ray flux is about 1.3 $\times 10^{-12}$ erg/cm$^2$/s at a distance, most likely, larger than 5 kpc, emitted in the whole 0.2-12 keV range without a significant cut-off energy. The X-ray spectra are complex; they can be fitted including a power law component with a relatively flat slope (a power law index of about 1), although, alternatively, a hard thermal component at kT$\geq$19 keV also yields a good fit. The SALT optical spectra obtained in 2019 March and 2022 May are quite typical of IPs, with strong emission lines, including some due to a high ionization potential, like He II at 4685.7 Angstrom. Nebular lines of O [III] were prominent in 2019 March, but their intensity and equivalent width appeared to be decreasing during that month, and they were no longer detectable in 2022, indicating that the nova ejecta dispersed. Complex profiles of the He II lines of V407 Lup are also characteristic of IPs, giving further evidence for this classification.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
X-ray Variability in the Symbiotic Binary RT Cru: Principal Component Analysis
Authors:
A. Danehkar,
J. J. Drake,
G. J. M. Luna
Abstract:
Hard X-ray-emitting ($δ$-type) symbiotic binaries, which exhibit a strong hard X-ray excess, have posed a challenge to our understanding of accretion physics in degenerate dwarfs. RT Cru, which is a member of the $δ$-type symbiotics, shows stochastic X-ray variability. Timing analyses of X-ray observations from XMM-Newton and NuSTAR, which we consider here, indicate hourly fluctuations, in additio…
▽ More
Hard X-ray-emitting ($δ$-type) symbiotic binaries, which exhibit a strong hard X-ray excess, have posed a challenge to our understanding of accretion physics in degenerate dwarfs. RT Cru, which is a member of the $δ$-type symbiotics, shows stochastic X-ray variability. Timing analyses of X-ray observations from XMM-Newton and NuSTAR, which we consider here, indicate hourly fluctuations, in addition to a spectral transition from 2007 to a harder state in 2012 seen with Suzaku observations. To trace the nature of X-ray variability, we analyze the multi-mission X-ray data using principal component analysis (PCA), which determines the spectral components that contribute most to the flickering behavior and the hardness transition. The Chandra HRC-S/LETG and XMM-Newton EPIC-pn data provide the primary PCA components, which may contain some variable emission features, especially in the soft excess. Additionally, the absorbing column (first order with 50%), along with the source continuum (20%), and a third component (9%) - which likely accounts for thermal emission in the soft band - are the three principal components found in the Suzaku XIS1 observations. The PCA components of the NuSTAR data also correspond to the continuum and possibly emission features. Our findings suggest that the spectral hardness transition between the two Suzaku observations is mainly due to changes in the absorbing material and X-ray continuum, while some changes in the thermal plasma emission may result in flickering-type variations.
△ Less
Submitted 28 August, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Evolution of the optical emission lines and the X-ray emission during the super-active stage of T CrB
Authors:
K. A. Stoyanov,
G. J. M. Luna,
R. K. Zamanov,
K. Ilkiewicz,
Y. M. Nikolov,
M. Moyseev,
M. Minev,
A. Kurtenkov,
S. Y. Stefanov
Abstract:
T CrB is a symbiotic star that experiences nova outbursts every $\sim$ 80~yr. The next, long-anticipated nova outburst should occur during the 2024-2026 period. Here, we present results of high-resolution optical spectroscopy of T CrB in the period 2016 - 2023. In these spectra, we measured the equivalent widths of the H$α$, H$β$, HeI and HeII emission lines. The maximum equivalent width (EW) was…
▽ More
T CrB is a symbiotic star that experiences nova outbursts every $\sim$ 80~yr. The next, long-anticipated nova outburst should occur during the 2024-2026 period. Here, we present results of high-resolution optical spectroscopy of T CrB in the period 2016 - 2023. In these spectra, we measured the equivalent widths of the H$α$, H$β$, HeI and HeII emission lines. The maximum equivalent width (EW) was recorded on May 2021, when the EW of $Hα$ reached -44.6 Åand H$β$ = -21.5 Å. At the other extreme, the minimum of EW($Hα$)= -2.9 Åwas recorded in October 2023. After October 2023, the B-band emission brightened, suggesting a re-appearance of the orbital modulation. In addition to the optical data, we study the X-ray behaviour in the same period. We find a strong correlation between $EW(Hα)$ and X-ray flux with a correlation coefficient -0.78 and a significance of 2.6$\times 10^{-5}$.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Symbiotic stars in X-rays IV: XMM-Newton, Swift and TESS observations
Authors:
Isabel J. Lima,
G. Juan M. Luna,
Koji Mukai,
Alexandre S. Oliveira,
Jennifer L. Sokoloski,
Fred Walter,
Natalia Palivanas,
Natalia E. Nuñez,
Rafael R. Souza,
Rosana A. N. Araujo
Abstract:
White dwarf symbiotic binaries are detected in X-rays with luminosities in the range of 10$^{30}$ to 10$^{34}$ lumcgs. Their X-ray emission arises either from the accretion disk boundary layer, from a region where the winds from both components collide or from nuclear burning on the white dwarf surface. In our continuous effort to identify X-ray emitting symbiotic stars, we studied four systems us…
▽ More
White dwarf symbiotic binaries are detected in X-rays with luminosities in the range of 10$^{30}$ to 10$^{34}$ lumcgs. Their X-ray emission arises either from the accretion disk boundary layer, from a region where the winds from both components collide or from nuclear burning on the white dwarf surface. In our continuous effort to identify X-ray emitting symbiotic stars, we studied four systems using observations from the Neil Gehrels Swift Observatory and XMM-Newton satellites in X-rays and from TESS in the optical. The X-ray spectra were fit with absorbed optically thin thermal plasma models, either single- or multitemperature with kT $<$ 8 keV for all targets. Based on the characteristics of their X-ray spectra, we classified BD Cam as possible $β$-type, V1261 Ori and CD -27 8661 as $δ$-type, and confirmed NQ Gem as $β$/$δ$-type. The $δ$-type X-ray emission most likely arise in the boundary layer of the accretion disk, while in the case of BD Cam, its mostly-soft emission originates from shocks, possibly between the red giant and WD/disk winds. In general, we have found that the observed X-ray emission is powered by accretion at a low accretion rate of about 10$^{-11}$ M$_{\odot}$ yr$^{-1}$. The low ratio of X-ray to optical luminosities, however indicates that the accretion-disk boundary layer is mostly optically thick and tends to emit in the far or extreme UV. The detection of flickering in optical data provides evidence of the existence of an accretion disk.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Probing valley phenomena with gate-defined valley splitters
Authors:
Juan Daniel Torres Luna,
Kostas Vilkelis,
Antonio L. R. Manesco
Abstract:
Despite many reports of valley-related phenomena in graphene and its multilayers, current transport experiments cannot probe valley phenomena without the application of external fields. Here we propose a gate-defined valley splitter as a direct transport probe for valley phenomenon in graphene multilayers. First, we show how the device works, its magnetotransport response, and its robustness again…
▽ More
Despite many reports of valley-related phenomena in graphene and its multilayers, current transport experiments cannot probe valley phenomena without the application of external fields. Here we propose a gate-defined valley splitter as a direct transport probe for valley phenomenon in graphene multilayers. First, we show how the device works, its magnetotransport response, and its robustness against fabrication errors. Secondly, we present two applications for valley splitters: (i) resonant tunneling of quantum dots probed by a valley splitter shows the valley polarization of dot levels; (ii) a combination of two valley splitters resolves the nature of order parameters in mesoscopic samples.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Subgroup Discovery in MOOCs: A Big Data Application for Describing Different Types of Learners
Authors:
J. M. Luna,
H. M. Fardoun,
F. Padillo,
C. Romero,
S. Ventura
Abstract:
The aim of this paper is to categorize and describe different types of learners in massive open online courses (MOOCs) by means of a subgroup discovery approach based on MapReduce. The final objective is to discover IF-THEN rules that appear in different MOOCs. The proposed subgroup discovery approach, which is an extension of the well-known FP-Growth algorithm, considers emerging parallel methodo…
▽ More
The aim of this paper is to categorize and describe different types of learners in massive open online courses (MOOCs) by means of a subgroup discovery approach based on MapReduce. The final objective is to discover IF-THEN rules that appear in different MOOCs. The proposed subgroup discovery approach, which is an extension of the well-known FP-Growth algorithm, considers emerging parallel methodologies like MapReduce to be able to cope with extremely large datasets. As an additional feature, the proposal includes a threshold value to denote the number of courses that each discovered rule should satisfy. A post-processing step is also included so redundant subgroups can be removed. The experimental stage is carried out by considering de-identified data from the first year of 16 MITx and HarvardX courses on the edX platform. Experimental results demonstrate that the proposed MapReduce approach outperforms traditional sequential subgroup discovery approaches, achieving a runtime that is almost constant for different courses. Additionally, thanks to the final post-processing step, only interesting and not-redundant rules are discovered, hence reducing the number of subgroups in one or two orders of magnitude. Finally, the discovered subgroups are easily used by courses' instructors not only for descriptive purposes but also for additional tasks such as recommendation or personalization.
△ Less
Submitted 10 February, 2024;
originally announced March 2024.
-
PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures
Authors:
Christina Giannoula,
Peiming Yang,
Ivan Fernandez Vega,
Jiacheng Yang,
Sankeerth Durvasula,
Yu Xin Li,
Mohammad Sadrosadati,
Juan Gomez Luna,
Onur Mutlu,
Gennady Pekhimenko
Abstract:
Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple p…
▽ More
Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML library that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim is publicly available at https://github.com/CMU-SAFARI/PyGim.
△ Less
Submitted 16 October, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Flux-tunable Kitaev chain in a quantum dot array
Authors:
Juan Daniel Torres Luna,
A. Mert Bozkurt,
Michael Wimmer,
Chun-Xiao Liu
Abstract:
Connecting quantum dots through Andreev bound states in a semiconductor-superconductor hybrid provides a platform to create a Kitaev chain. Interestingly, in a double quantum dot, a pair of poor man's Majorana zero modes can emerge when the system is fine-tuned to a sweet spot, where superconducting and normal couplings are equal in magnitude. Control of the Andreev bound states is crucial for ach…
▽ More
Connecting quantum dots through Andreev bound states in a semiconductor-superconductor hybrid provides a platform to create a Kitaev chain. Interestingly, in a double quantum dot, a pair of poor man's Majorana zero modes can emerge when the system is fine-tuned to a sweet spot, where superconducting and normal couplings are equal in magnitude. Control of the Andreev bound states is crucial for achieving this, usually implemented by varying its chemical potential. In this work, we propose using Andreev bound states in a short Josephson junction to mediate both types of couplings, with the ratio tunable by the phase difference across the junction. Now a minimal Kitaev chain can be easily tuned into the strong coupling regime by varying the phase and junction asymmetry, even without changing the dot-hybrid coupling strength. Furthermore, we identify an optimal sweet spot at $π$ phase, enhancing the excitation gap and robustness against phase fluctuations. Our proposal introduces a new device platform and a new tuning method for realizing quantum-dot-based Kitaev chains.
△ Less
Submitted 25 July, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Soft X-ray and FUV observations of Nova Her 2021 (V1674~Her) with AstroSat
Authors:
Yash Bhargava,
Gulab Chand Dewangan,
G. C. Anupama,
U. S. Kamath,
L. S. Sonith,
Kulinder Pal Singh,
J. J. Drake,
A. Beardmore,
G. J. M. Luna,
M. Orio,
K. L. Page
Abstract:
Nova Her 2021 or V1674 Her was one of the fastest novae to be observed so far. We report here the results from our timing and spectral studies of the source observed at multiple epochs with AstroSat. We report the detection of a periodicity in the source in soft X-rays at a period of 501.4--501.5 s which was detected with high significance after the peak of the super-soft phase, but was not detect…
▽ More
Nova Her 2021 or V1674 Her was one of the fastest novae to be observed so far. We report here the results from our timing and spectral studies of the source observed at multiple epochs with AstroSat. We report the detection of a periodicity in the source in soft X-rays at a period of 501.4--501.5 s which was detected with high significance after the peak of the super-soft phase, but was not detected in the far ultraviolet (FUV) band of AstroSat. The shape of the phase-folded X-ray light curves has varied significantly as the nova evolved. The phase-resolved spectral studies reveal the likely presence of various absorption features in the soft X-ray band of 0.5--2 keV, and suggest that the optical depth of these absorption features may be marginally dependent on the pulse phase. Strong emission lines from Si, N and O are detected in the FUV, and their strength declined continuously as the nova evolved and went through a bright X-ray state.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
PULSAR: Simultaneous Many-Row Activation for Reliable and High-Performance Computing in Off-the-Shelf DRAM Chips
Authors:
Ismail Emir Yuksel,
Yahya Can Tugrul,
F. Nisa Bostanci,
Abdullah Giray Yaglikci,
Ataberk Olgun,
Geraldo F. Oliveira,
Melina Soysal,
Haocong Luo,
Juan Gomez Luna,
Mohammad Sadrosadati,
Onur Mutlu
Abstract:
Data movement between the processor and the main memory is a first-order obstacle against improving performance and energy efficiency in modern systems. To address this obstacle, Processing-using-Memory (PuM) is a promising approach where bulk-bitwise operations are performed leveraging intrinsic analog properties within the DRAM array and massive parallelism across DRAM columns. Unfortunately, 1)…
▽ More
Data movement between the processor and the main memory is a first-order obstacle against improving performance and energy efficiency in modern systems. To address this obstacle, Processing-using-Memory (PuM) is a promising approach where bulk-bitwise operations are performed leveraging intrinsic analog properties within the DRAM array and massive parallelism across DRAM columns. Unfortunately, 1) modern off-the-shelf DRAM chips do not officially support PuM operations, and 2) existing techniques of performing PuM operations on off-the-shelf DRAM chips suffer from two key limitations. First, these techniques have low success rates, i.e., only a small fraction of DRAM columns can correctly execute PuM operations because they operate beyond manufacturer-recommended timing constraints, causing these operations to be highly susceptible to noise and process variation. Second, these techniques have limited compute primitives, preventing them from fully leveraging parallelism across DRAM columns and thus hindering their performance benefits.
We propose PULSAR, a new technique to enable high-success-rate and high-performance PuM operations in off-the-shelf DRAM chips. PULSAR leverages our new observation that a carefully crafted sequence of DRAM commands simultaneously activates up to 32 DRAM rows. PULSAR overcomes the limitations of existing techniques by 1) replicating the input data to improve the success rate and 2) enabling new bulk bitwise operations (e.g., many-input majority, Multi-RowInit, and Bulk-Write) to improve the performance.
Our analysis on 120 off-the-shelf DDR4 chips from two major manufacturers shows that PULSAR achieves a 24.18% higher success rate and 121% higher performance over seven arithmetic-logic operations compared to FracDRAM, a state-of-the-art off-the-shelf DRAM-based PuM technique.
△ Less
Submitted 18 March, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
The contact binary system TYC 7275-1968-1 as seen by optical, UV and X-ray observations
Authors:
Isabel J. Lima,
Ana C. Mattiuci,
G. Juan M. Luna,
Alexandre S. Oliveira,
Claudia V. Rodrigues,
Natalia Palivanas,
Natalia E. Nunez
Abstract:
We present an analysis of publicly available X-ray and optical observations of TYC 7275-1968-1, a contact binary, red nova progenitor candidate. The long optical time series of ASAS-SN, SuperWASP, CRTS, GAIA, ASAS-3, and TESS enabled us to improve its orbital period to 0.3828071 $\pm$ 0.0000026 d. We show the presence of an X-ray and UV source associated with TYC 7275-1968-1 from Neil Gehrels Swif…
▽ More
We present an analysis of publicly available X-ray and optical observations of TYC 7275-1968-1, a contact binary, red nova progenitor candidate. The long optical time series of ASAS-SN, SuperWASP, CRTS, GAIA, ASAS-3, and TESS enabled us to improve its orbital period to 0.3828071 $\pm$ 0.0000026 d. We show the presence of an X-ray and UV source associated with TYC 7275-1968-1 from Neil Gehrels Swift Observatory, that was previously assumed to be the counterpart of CD -36 8436 (V1044 Cen), a symbiotic star located 22 arcsec from the red nova candidate. The X-ray data indicate the presence of a region with a temperature of $kT$ = 0.8$^{+0.9}_{-0.1}$ keV and a luminosity of 1.4$^{+0.1}_{-0.2}$ $\times$ 10$^{31}$ erg s $^{-1}$ in the range 0.3 - 10 keV. The detection of X-rays and modulated UV emission suggests that both components of the binary are chromospherically active.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Read Disturbance in High Bandwidth Memory: A Detailed Experimental Study on HBM2 DRAM Chips
Authors:
Ataberk Olgun,
Majd Osseiran,
Abdullah Giray Yaglikci,
Yahya Can Tugrul,
Haocong Luo,
Steve Rhyner,
Behzad Salami,
Juan Gomez Luna,
Onur Mutlu
Abstract:
We experimentally demonstrate the effects of read disturbance (RowHammer and RowPress) and uncover the inner workings of undocumented read disturbance defense mechanisms in High Bandwidth Memory (HBM). Detailed characterization of six real HBM2 DRAM chips in two different FPGA boards shows that (1) the read disturbance vulnerability significantly varies between different HBM2 chips and between dif…
▽ More
We experimentally demonstrate the effects of read disturbance (RowHammer and RowPress) and uncover the inner workings of undocumented read disturbance defense mechanisms in High Bandwidth Memory (HBM). Detailed characterization of six real HBM2 DRAM chips in two different FPGA boards shows that (1) the read disturbance vulnerability significantly varies between different HBM2 chips and between different components (e.g., 3D-stacked channels) inside a chip, (2) DRAM rows at the end and in the middle of a bank are more resilient to read disturbance, (3) fewer additional activations are sufficient to induce more read disturbance bitflips in a DRAM row if the row exhibits the first bitflip at a relatively high activation count, (4) a modern HBM2 chip implements undocumented read disturbance defenses that track potential aggressor rows based on how many times they are activated. We describe how our findings could be leveraged to develop more powerful read disturbance attacks and more efficient defense mechanisms. We open source all our code and data to facilitate future research at https://github.com/CMU-SAFARI/HBM-Read-Disturbance.
△ Less
Submitted 2 May, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Iterative Clustering Material Decomposition Aided by Empirical Spectral Correction for High-Resolution Photon-Counting Detectors in Micro-CT
Authors:
Juan C. R. Luna,
Mini Das
Abstract:
Photon counting detectors (PCDs) offer promising advancements in computed tomography (CT) imaging by enabling the quantification and 3D imaging of contrast agents and tissue types through multi-energy projections. However, the accuracy of these decomposition methods hinges on precise composite spectral attenuation values that one must reconstruct from spectral micro CT. Factors such as surface def…
▽ More
Photon counting detectors (PCDs) offer promising advancements in computed tomography (CT) imaging by enabling the quantification and 3D imaging of contrast agents and tissue types through multi-energy projections. However, the accuracy of these decomposition methods hinges on precise composite spectral attenuation values that one must reconstruct from spectral micro CT. Factors such as surface defects, local temperature, signal amplification, and impurity levels can cause variations in detector efficiency between pixels, leading to significant quantitative errors. In addition, some inaccuracies such as the charge-sharing effects in PCDs are amplified with a high Z sensor material and also with a smaller detector pixels that are preferred for micro CT. In this work, we propose a comprehensive approach that combines practical instrumentation and measurement strategies leading to the quantitation of multiple materials within an object in a spectral micro CT with a photon counting detector. Our Iterative Clustering Material Decomposition (ICMD) includes an empirical method for detector spectral response corrections, cluster analysis and multi-step iterative material decomposition. Utilizing a CdTe-1mm Medipix detector with a 55$μ$m pitch, we demonstrate the quantitatively accurate decomposition of several materials in a phantom study, where the sample includes mixtures of material, soft material and K-edge materials. We also show an example of biological sample imaging and separating three distinct types of tissue in mouse: muscle, fat and bone. Our experimental results show that the combination of spectral correction and high-dimensional data clustering enhances decomposition accuracy and reduces noise in micro CT. This ICMD allows for quantitative separation of more than three materials including mixtures and also effectively separates multi-contrast agents.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
The orbital period of the nova V1674 Her as observed with TESS
Authors:
G. J. M. Luna,
I. J. Lima,
M. Orio
Abstract:
Nova Her 2021 was observed with TESS 12.62 days after its most recent outburst in June 12.537 2021. This cataclysmic variable belongs to the intermediate polar class, with an spin period of $\sim$501 s and orbital period of 0.1529 days. During TESS observations of Sector 40, the orbital period of 0.1529(1) days is detected significantly 17 days after the onset of the outburst. A modulation, of unk…
▽ More
Nova Her 2021 was observed with TESS 12.62 days after its most recent outburst in June 12.537 2021. This cataclysmic variable belongs to the intermediate polar class, with an spin period of $\sim$501 s and orbital period of 0.1529 days. During TESS observations of Sector 40, the orbital period of 0.1529(1) days is detected significantly 17 days after the onset of the outburst. A modulation, of unknown origin, with a period of $\sim$0.537 days is present in the data from day 13 until day 17.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Transient and asymmetric dust structures in the TeV-bright nova RS Oph revealed by spectropolarimetry
Authors:
Y. Nikolov,
G. J. M. Luna,
K. A. Stoyanov,
G. Borisov,
K. Mukai,
J. L. Sokoloski,
A. Avramova-Boncheva
Abstract:
A long-standing question related to nova eruptions is how these eruptions can lead to the formation of dust despite the ostensibly inhospitable environment for dust within the hot, irradiated ejecta. Novae in systems such as the symbiotic binary RS Oph offers a articularly clear view of some nova shocks and any associated dust production. Here we use spectropolarimetric monitoring of the RS Oph st…
▽ More
A long-standing question related to nova eruptions is how these eruptions can lead to the formation of dust despite the ostensibly inhospitable environment for dust within the hot, irradiated ejecta. Novae in systems such as the symbiotic binary RS Oph offers a articularly clear view of some nova shocks and any associated dust production. Here we use spectropolarimetric monitoring of the RS Oph starting two days after its eruption in 2021 Aug. to show that: dust was present in the RS Oph system as early as two days into the 2021 eruption; the spatial distribution of this early dust was asymmetric, with components both aligned with and perpendicular to the orbital plane of the binary; between two and nine days after the start of the eruption, this early dust was gradually destroyed; and dust was again created, aligned roughly with the orbital plane of the binary, more than 80 days after the start of the outburst, most likely as a result of shocks that arose as the ejecta interacted with circumbinary material concentrated in the orbital plane. Modelling of X-rays and very-high energy GeV and TeV emission from RS Oph days to months into the 2021 eruption suggests that collisions between the ejecta and the circumbinary material may have led to shock formation in two regions: the polar - perpendicular to the orbital plane where collimated outflows have been observed after prior eruptions, and a circumbinary torus in the orbital plane. The observations described here indicate that dust formed in approximately the same two regions, supporting the connection between shocks and dust in novae and revealing a very early onset of asymmetry. The spectropolarimetric signatures of RS Oph in the first week into the 2021 outburst indicate: polarized flux across the Hα emission line and position angle orientation relative to the radio axis are similar to the spectropolarimetric signatures of AGNs.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Fermionic quantum computation with Cooper pair splitters
Authors:
Kostas Vilkelis,
Antonio Manesco,
Juan Daniel Torres Luna,
Sebastian Miles,
Michael Wimmer,
Anton Akhmerov
Abstract:
We propose a practical implementation of a universal quantum computer that uses local fermionic modes (LFM) rather than qubits. The device layout consists of quantum dots tunnel coupled by a hybrid superconducting island and a tunable capacitive coupling between the dots. We show that coherent control of Cooper pair splitting, elastic cotunneling, and Coulomb interactions allows us to implement th…
▽ More
We propose a practical implementation of a universal quantum computer that uses local fermionic modes (LFM) rather than qubits. The device layout consists of quantum dots tunnel coupled by a hybrid superconducting island and a tunable capacitive coupling between the dots. We show that coherent control of Cooper pair splitting, elastic cotunneling, and Coulomb interactions allows us to implement the universal set of quantum gates defined by Bravyi and Kitaev. Due to the similarity with charge qubits, we expect charge noise to be the main source of decoherence. For this reason, we also consider an alternative design where the quantum dots have tunable coupling to the superconductor. In this second device design, we show that there is a sweetspot for which the local fermionic modes are charge neutral, making the device insensitive to charge noise effects. Finally, we compare both designs and their experimental limitations and suggest future efforts to overcome them.
△ Less
Submitted 5 June, 2024; v1 submitted 1 September, 2023;
originally announced September 2023.
-
Multi-Scenario Empirical Assessment of Agile Governance Theory: A Technical Report
Authors:
Alexandre J. H. de O. Luna,
Marcelo L. M. Marinho
Abstract:
Context: Agile Governance Theory (AGT) has emerged as a potential model for organizational chains of responsibility across business units and teams. Objective: This study aims to assess how AGT is reflected in practice. Method: AGT was operationalized down into 16 testable hypotheses. All hypotheses were tested by arranging eight theoretical scenarios with 118 practitioners from 86 organizations a…
▽ More
Context: Agile Governance Theory (AGT) has emerged as a potential model for organizational chains of responsibility across business units and teams. Objective: This study aims to assess how AGT is reflected in practice. Method: AGT was operationalized down into 16 testable hypotheses. All hypotheses were tested by arranging eight theoretical scenarios with 118 practitioners from 86 organizations and 19 countries who completed an in-depth explanatory scenario-based survey. The feedback results were analyzed using Structural Equation Modeling (SEM) and Confirmatory Factor Analysis (CFA). Results: The analyses supported key theory components and hypotheses, such as mediation between agile capabilities and business operations through governance capabilities. Conclusion: This study supports the theory and suggests that AGT can assist teams in gaining a better understanding of their organization governance in an agile context. A better understanding can help remove delays and misunderstandings that can come about with unclear decision-making channels, which can jeopardize the fulfillment of the overall strategy.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
The RS Oph outburst of 2021 monitored in X-rays with NICER
Authors:
Marina Orio,
Keith Gendreau,
Morgan Giese,
Gerardo Juna M. Luna,
Jozef Magdolen,
Tod E. Strohmayer,
Andy E. Zhang,
Diego Altamirano,
Andrej Dobrotka,
Teruaki Enoto,
Elizabeth C. Ferrara,
Richard Ignace,
Sebastian heinz,
Craig Markwardt,
Joy S. Nichols,
Micahel L. Parker,
Dheerajay R. Pasham,
Songpeng Pei,
Pragati Pradhan,
Ron Remillard,
James F. Steiner,
Francesco Tombesi
Abstract:
The 2021 outburst of the symbiotic recurrent nova RS Oph was monitored with the Neutron Star Interior Composition Explorer Mission (NICER) in the 0.2-12 keV range from day one after the optical maximum, until day 88, producing an unprecedented, detailed view of the outburst development. The X-ray flux preceding the supersoft X-ray phase peaked almost 5 days after optical maximum and originated onl…
▽ More
The 2021 outburst of the symbiotic recurrent nova RS Oph was monitored with the Neutron Star Interior Composition Explorer Mission (NICER) in the 0.2-12 keV range from day one after the optical maximum, until day 88, producing an unprecedented, detailed view of the outburst development. The X-ray flux preceding the supersoft X-ray phase peaked almost 5 days after optical maximum and originated only in shocked ejecta for 21 to 25 days. The emission was thermal; in the first 5 days only a non-collisional-ionization equilibrium model fits the spectrum, and a transition to equilibrium occurred between days 6 and 12. The ratio of peak X-rays flux measured in the NICER range to that measured with Fermi in the 60 MeV-500 GeV range was about 0.1, and the ratio to the peak flux measured with H.E.S.S. in the 250 GeV-2.5 TeV range was about 100. The central supersoft X-ray source (SSS), namely the shell hydrogen burning white dwarf (WD), became visible in the fourth week, initially with short flares. A huge increase in flux occurred on day 41, but the SSS flux remained variable. A quasi-periodic oscillation every ~35 s was always observed during the SSS phase, with variations in amplitude and a period drift that appeared to decrease in the end. The SSS has characteristics of a WD of mass >1 M(solar). Thermonuclear burning switched off shortly after day 75, earlier than in 2006 outburst. We discuss implications for the nova physics.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
A Large and Variable Leading Tail of Helium in a Hot Saturn Undergoing Runaway Inflation
Authors:
Michael Gully-Santiago,
Caroline V. Morley,
Jessica Luna,
Morgan MacLeod,
Antonija Oklopčić,
Aishwarya Ganesh,
Quang H. Tran,
Zhoujian Zhang,
Brendan P. Bowler,
William D. Cochran,
Daniel M. Krolikowski,
Suvrath Mahadevan,
Joe P. Ninan,
Guðmundur Stefánsson,
Andrew Vanderburg,
Joseph A. Zalesky,
Gregory R. Zeimann
Abstract:
Atmospheric escape shapes the fate of exoplanets, with statistical evidence for transformative mass loss imprinted across the mass-radius-insolation distribution. Here we present transit spectroscopy of the highly irradiated, low-gravity, inflated hot Saturn HAT-P-67 b. The Habitable Zone Planet Finder (HPF) spectra show a detection of up to 10% absorption depth of the 10833 Angstrom Helium triple…
▽ More
Atmospheric escape shapes the fate of exoplanets, with statistical evidence for transformative mass loss imprinted across the mass-radius-insolation distribution. Here we present transit spectroscopy of the highly irradiated, low-gravity, inflated hot Saturn HAT-P-67 b. The Habitable Zone Planet Finder (HPF) spectra show a detection of up to 10% absorption depth of the 10833 Angstrom Helium triplet. The 13.8 hours of on-sky integration time over 39 nights sample the entire planet orbit, uncovering excess Helium absorption preceding the transit by up to 130 planetary radii in a large leading tail. This configuration can be understood as the escaping material overflowing its small Roche lobe and advecting most of the gas into the stellar -- and not planetary -- rest frame, consistent with the Doppler velocity structure seen in the Helium line profiles. The prominent leading tail serves as direct evidence for dayside mass loss with a strong day-/night- side asymmetry. We see some transit-to-transit variability in the line profile, consistent with the interplay of stellar and planetary winds. We employ 1D Parker wind models to estimate the mass loss rate, finding values on the order of $2\times10^{13}$ g/s, with large uncertainties owing to the unknown XUV flux of the F host star. The large mass loss in HAT-P-67 b represents a valuable example of an inflated hot Saturn, a class of planets recently identified to be rare as their atmospheres are predicted to evaporate quickly. We contrast two physical mechanisms for runaway evaporation: Ohmic dissipation and XUV irradiation, slightly favoring the latter.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Design of a Majorana trijunction
Authors:
Juan Daniel Torres Luna,
Sathish R. Kuppuswamy,
Anton R. Akhmerov
Abstract:
Braiding of Majorana states demonstrates their non-Abelian exchange statistics. One implementation of braiding requires control of the pairwise couplings between all Majorana states in a trijunction device. To have adiabaticity, a trijunction device requires the desired pair coupling to be sufficiently large and the undesired couplings to vanish. In this work, we design and simulate a trijunction…
▽ More
Braiding of Majorana states demonstrates their non-Abelian exchange statistics. One implementation of braiding requires control of the pairwise couplings between all Majorana states in a trijunction device. To have adiabaticity, a trijunction device requires the desired pair coupling to be sufficiently large and the undesired couplings to vanish. In this work, we design and simulate a trijunction device in a two-dimensional electron gas with a focus on the normal region that connects three Majorana states. We use an optimisation approach to find the operational regime of the device in a multi-dimensional voltage space. Using the optimization results, we simulate a braiding experiment by adiabatically coupling different pairs of Majorana states without closing the topological gap. We then evaluate the feasibility of braiding in a trijunction device for different shapes and disorder strengths.
△ Less
Submitted 8 January, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
K2 & TESS observations of symbiotic X-ray binaries: GX 1+4 and IGR J16194-2810
Authors:
G. J. M. Luna
Abstract:
I analyze the K2 and TESS data taken in 2016, 2019 and 2021 of the symbiotic X-ray binaries GX 1+4 and IGR J16194-2810. GX 1+4 consists of a pulsar accreting from a red giant companion in a 1160 days orbit. Since 1984, the pulsar has shown a continuous spin-down rate of $\dot{P}$=-0.1177(3) mHZ/yr. I report the detection of the spin period at an average value of 180.426(1) seconds as observed with…
▽ More
I analyze the K2 and TESS data taken in 2016, 2019 and 2021 of the symbiotic X-ray binaries GX 1+4 and IGR J16194-2810. GX 1+4 consists of a pulsar accreting from a red giant companion in a 1160 days orbit. Since 1984, the pulsar has shown a continuous spin-down rate of $\dot{P}$=-0.1177(3) mHZ/yr. I report the detection of the spin period at an average value of 180.426(1) seconds as observed with the K2 mission and confirm that the spin period continues to increase at a rate of $\sim$1.61$\times$10$^{-7}$ s/s. The K2 and hard X-rays, as observed with Swift/BAT, varied in tandem, in agreement with other authors who proposed that the optical light arise from reprocessed X-ray emission.
In the case of IGR J16194-2810, the X-ray and optical spectroscopy have been interpreted as arising from a neutron star accreting from a M2 III red giant companion. Its orbital period is unknown, while I report here the detection of a modulation with a period of 242.837 min, interpreted as the neutron star spin period. IGR J16194-2810 is thus the second symbiotic X-ray binary where the spin period is detected in optical wavelengths. This period, however, was only detected during the TESS observations of Sector 12 in 2019. The non-detection of this modulation during the observations of Sector 39 in 2021 is perhaps related with the orbital modulation, i.e. a low inclination of the orbit.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Giant Tidal Tails of Helium Escaping the Hot Jupiter HAT-P-32 b
Authors:
Zhoujian Zhang,
Caroline V. Morley,
Michael Gully-Santiago,
Morgan MacLeod,
Antonija Oklopčić,
Jessica Luna,
Quang H. Tran,
Joe P. Ninan,
Suvrath Mahadevan,
Daniel M. Krolikowski,
William D. Cochran,
Brendan P. Bowler,
Michael Endl,
Gudmundur Stefánsson,
Benjamin M. Tofflemire,
Andrew Vanderburg,
Gregory R. Zeimann
Abstract:
Capturing planets in the act of losing their atmospheres provides rare opportunities to probe their evolution history. Such analysis has been enabled by observations of the helium triplet at 10833 Å, but past studies have focused on the narrow time window right around the planet's optical transit. We monitored the hot Jupiter HAT-P-32 b using high-resolution spectroscopy from the Hobby-Eberly Tele…
▽ More
Capturing planets in the act of losing their atmospheres provides rare opportunities to probe their evolution history. Such analysis has been enabled by observations of the helium triplet at 10833 Å, but past studies have focused on the narrow time window right around the planet's optical transit. We monitored the hot Jupiter HAT-P-32 b using high-resolution spectroscopy from the Hobby-Eberly Telescope covering the planet's full orbit. We detected helium escaping HAT-P-32 b at a $14σ$ significance, with extended leading and trailing tails spanning a projected length over 53 times the planet's radius. These tails are among the largest known structures associated with an exoplanet. We interpret our observations using three-dimensional hydrodynamic simulations, which predict Roche Lobe overflow with extended tails along the planet's orbital path.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
An Experimental Analysis of RowHammer in HBM2 DRAM Chips
Authors:
Ataberk Olgun,
Majd Osseiran,
Abdullah Giray Ya{ğ}lık{c}ı,
Yahya Can Tuğrul,
Haocong Luo,
Steve Rhyner,
Behzad Salami,
Juan Gomez Luna,
Onur Mutlu
Abstract:
RowHammer (RH) is a significant and worsening security, safety, and reliability issue of modern DRAM chips that can be exploited to break memory isolation. Therefore, it is important to understand real DRAM chips' RH characteristics. Unfortunately, no prior work extensively studies the RH vulnerability of modern 3D-stacked high-bandwidth memory (HBM) chips, which are commonly used in modern GPUs.…
▽ More
RowHammer (RH) is a significant and worsening security, safety, and reliability issue of modern DRAM chips that can be exploited to break memory isolation. Therefore, it is important to understand real DRAM chips' RH characteristics. Unfortunately, no prior work extensively studies the RH vulnerability of modern 3D-stacked high-bandwidth memory (HBM) chips, which are commonly used in modern GPUs.
In this work, we experimentally characterize the RH vulnerability of a real HBM2 DRAM chip. We show that 1) different 3D-stacked channels of HBM2 memory exhibit significantly different levels of RH vulnerability (up to 79% difference in bit error rate), 2) the DRAM rows at the end of a DRAM bank (rows with the highest addresses) exhibit significantly fewer RH bitflips than other rows, and 3) a modern HBM2 DRAM chip implements undisclosed RH defenses that are triggered by periodic refresh operations. We describe the implications of our observations on future RH attacks and defenses and discuss future work for understanding RH in 3D-stacked memories.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses
Authors:
Rakesh Nadig,
Mohammad Sadrosadati,
Haiyu Mao,
Nika Mansouri Ghiasi,
Arash Tavakkol,
Jisung Park,
Hamid Sarbazi-Azad,
Juan Gómez Luna,
Onur Mutlu
Abstract:
The performance and capacity of solid-state drives (SSDs) are continuously improving to meet the increasing demands of modern data-intensive applications. Unfortunately, communication between the SSD controller and memory chips (e.g., 2D/3D NAND flash chips) is a critical performance bottleneck for many applications. SSDs use a multi-channel shared bus architecture where multiple memory chips conn…
▽ More
The performance and capacity of solid-state drives (SSDs) are continuously improving to meet the increasing demands of modern data-intensive applications. Unfortunately, communication between the SSD controller and memory chips (e.g., 2D/3D NAND flash chips) is a critical performance bottleneck for many applications. SSDs use a multi-channel shared bus architecture where multiple memory chips connected to the same channel communicate to the SSD controller with only one path. As a result, path conflicts often occur during the servicing of multiple I/O requests, which significantly limits SSD parallelism. It is critical to handle path conflicts well to improve SSD parallelism and performance. Our goal is to fundamentally tackle the path conflict problem by increasing the number of paths between the SSD controller and memory chips at low cost. To this end, we build on the idea of using an interconnection network to increase the path diversity between the SSD controller and memory chips. We propose Venice, a new mechanism that introduces a low-cost interconnection network between the SSD controller and memory chips and utilizes the path diversity to intelligently resolve path conflicts. Venice employs three key techniques: 1) a simple router chip added next to each memory chip without modifying the memory chip design, 2) a path reservation technique that reserves a path from the SSD controller to the target memory chip before initiating a transfer, and 3) a fully-adaptive routing algorithm that effectively utilizes the path diversity to resolve path conflicts. Our experimental results show that Venice 1) improves performance by an average of 2.65x/1.67x over a baseline performance-optimized/cost-optimized SSD design across a wide range of workloads, 2) reduces energy consumption by an average of 61% compared to a baseline performance-optimized SSD design.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
High-Performance and Scalable Agent-Based Simulation with BioDynaMo
Authors:
Lukas Breitwieser,
Ahmad Hesam,
Fons Rademakers,
Juan Gómez Luna,
Onur Mutlu
Abstract:
Agent-based modeling plays an essential role in gaining insights into biology, sociology, economics, and other fields. However, many existing agent-based simulation platforms are not suitable for large-scale studies due to the low performance of the underlying simulation engines. To overcome this limitation, we present a novel high-performance simulation engine.
We identify three key challenges…
▽ More
Agent-based modeling plays an essential role in gaining insights into biology, sociology, economics, and other fields. However, many existing agent-based simulation platforms are not suitable for large-scale studies due to the low performance of the underlying simulation engines. To overcome this limitation, we present a novel high-performance simulation engine.
We identify three key challenges for which we present the following solutions. First, to maximize parallelization, we present an optimized grid to search for neighbors and parallelize the merging of thread-local results. Second, we reduce the memory access latency with a NUMA-aware agent iterator, agent sorting with a space-filling curve, and a custom heap memory allocator. Third, we present a mechanism to omit the collision force calculation under certain conditions.
Our evaluation shows an order of magnitude improvement over Biocellion, three orders of magnitude speedup over Cortex3D and NetLogo, and the ability to simulate 1.72 billion agents on a single server.
Supplementary Materials, including instructions to reproduce the results, are available at: https://doi.org/10.5281/zenodo.6463816
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems
Authors:
Nika Mansouri Ghiasi,
Nandita Vijaykumar,
Geraldo F. Oliveira,
Lois Orosa,
Ivan Fernandez,
Mohammad Sadrosadati,
Konstantinos Kanellopoulos,
Nastaran Hajinazar,
Juan Gómez Luna,
Onur Mutlu
Abstract:
Partitioning applications between NDP and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated from one segment (e.g., instructions, functions) and used in consecutive segments. Prior works take two approaches to this problem. The first class of works maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-seg…
▽ More
Partitioning applications between NDP and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated from one segment (e.g., instructions, functions) and used in consecutive segments. Prior works take two approaches to this problem. The first class of works maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-segment data movement overhead. The second class of works partitions applications based on the overall memory bandwidth saving of each segment, and does not offload each segment to the best-fitting core if they incur high inter-segment data movement. We show that 1) mapping each segment to its best-fitting core ideally can provide substantial benefits, and 2) the inter-segment data movement reduces this benefit significantly.
To this end, we introduce ALP, a new programmer-transparent technique to leverage the performance benefits of NDP by alleviating the inter-segment data movement overhead between host and memory and enabling efficient partitioning of applications. ALP alleviates the inter-segment data movement overhead by proactively and accurately transferring the required data between the segments. This is based on the key observation that the instructions that generate the inter-segment data stay the same across different executions of a program on different inputs. ALP uses a compiler pass to identify these instructions and uses specialized hardware to transfer data between the host and NDP cores at runtime. ALP efficiently maps application segments to either host or NDP considering 1) the properties of each segment, 2) the inter-segment data movement overhead, and 3) whether this overhead can be alleviated in a timely manner. We evaluate ALP across a wide range of workloads and show on average 54.3% and 45.4% speedup compared to only-host CPU or only-NDP executions, respectively.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Taking a break: paused accretion in the symbiotic binary RT Cru
Authors:
A. Pujol,
G. J. M. Luna,
K. Mukai,
J. L. Sokoloski,
N. P. M. Kuin,
F. M. Walter,
R. Angeloni,
Y. Nikolov,
R. Lopes de Oliveira,
N. E. Nuñez,
M. Jaque Arancibia,
T. Palma,
L. Gramajo
Abstract:
Symbiotic binaries sometimes hide their symbiotic nature for significant periods of time. There is mounting observational evidence that in those symbiotics that are powered solely by accretion of red-giant's wind material onto a white dwarf, without any quasi-steady shell burning on the surface of the white dwarf, the characteristic emission lines in the optical spectrum can vanish, leaving the se…
▽ More
Symbiotic binaries sometimes hide their symbiotic nature for significant periods of time. There is mounting observational evidence that in those symbiotics that are powered solely by accretion of red-giant's wind material onto a white dwarf, without any quasi-steady shell burning on the surface of the white dwarf, the characteristic emission lines in the optical spectrum can vanish, leaving the semblance of an isolated red giant spectrum. Here we present compelling evidence that this disappearance of optical emission lines from the spectrum of RT Cru during 2019 was due to a decrease in the accretion rate, which we derive by modeling the X-ray spectrum. This drop in accretion rate leads to a lower flux of ionizing photons and thus to faint/absent photoionization emission lines in the optical spectrum. We observed the white dwarf symbiotic RT Cru with XMM-Newton and Swift in X-rays and UV and collected ground-based optical spectra and photometry over the last 33 years. This long-term coverage shows that during most of the year 2019, the accretion rate onto the white dwarf was so low, $\dot{M}= (3.2\pm 0.06)\, \times$10$^{-11}$ $M_{\odot}$ yr$^{-1}$ (d/2.52 kpc)$^2$, that the historically detected hard X-ray emission almost vanished, the UV flux faded by roughly 5 magnitudes, the $U$, $B$ and $V$ flickering amplitude decreased, and the Balmer lines virtually disappeared from January through March 2019. Long-lasting low-accretion episodes as the one reported here may hamper the chances of RT Cru experiencing nova-type outburst despite the high-mass of the accreting white dwarf.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory
Authors:
Nika Mansouri Ghiasi,
Mohammad Sadrosadati,
Geraldo F. Oliveira,
Konstantinos Kanellopoulos,
Rachata Ausavarungnirun,
Juan Gómez Luna,
Aditya Manglik,
João Ferreira,
Jeremie S. Kim,
Christina Giannoula,
Nandita Vijaykumar,
Jisung Park,
Onur Mutlu
Abstract:
Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip with fine-grained connections. M3D technology leads to significantly higher main memory bandwidth and shorter latency than existing 3D-stacked systems. We show for a variety of workloads on a state-of-the-art M3D system that the performance and energy bottlenecks shift…
▽ More
Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip with fine-grained connections. M3D technology leads to significantly higher main memory bandwidth and shorter latency than existing 3D-stacked systems. We show for a variety of workloads on a state-of-the-art M3D system that the performance and energy bottlenecks shift from the main memory to the core and cache hierarchy. Hence, there is a need to revisit current core and cache designs that have been conventionally tailored to tackle the memory bottleneck.
Our goal is to redesign the core and cache hierarchy, given the fundamentally new trade-offs of M3D, to benefit a wide range of workloads. To this end, we take two steps. First, we perform a design space exploration of the cache and core's key components. We highlight that in M3D systems, (i) removing the shared last-level cache leads to similar or larger performance benefits than increasing its size or reducing its latency; (ii) improving L1 latency has a large impact on improving performance; (iii) wider pipelines are increasingly beneficial; (iv) the performance impact of branch speculation and pipeline frontend increases; (v) the current synchronization schemes limit parallel speedup. Second, we propose an optimized M3D system, RevaMp3D, where (i) using the tight connectivity between logic layers, we efficiently increase pipeline width, reduce L1 latency, and enable fine-grained synchronization; (ii) using the high-bandwidth and energy-efficient main memory, we alleviate the amplified energy and speculation bottlenecks by memoizing the repetitive fetched, decoded, and reordered instructions and turning off the relevant parts of the core pipeline when possible. RevaMp3D provides, on average, 81% speedup, 35% energy reduction, and 12.3% smaller area compared to the baseline M3D system.
△ Less
Submitted 16 October, 2022;
originally announced October 2022.
-
Shocks in the outflow of the RS Oph 2021 eruption observed with X-ray gratings
Authors:
Marina Orio,
Ehud Behar,
Juan Luna,
Jeremy Drake,
Jay Gallagher,
Joy S. Nichols,
Jan-Uwe Ness,
Andrej Dobrotka,
Joanna Mikolajewska,
Massimo Della Valle,
Rico Ignace,
Roy Rahin
Abstract:
The 2021 outburst of the symbiotic recurrent nova RS Oph was observed with the Chandra High Energy Transmission Gratings (HETG) on day 18 after optical maximum and with XMM-Newton and its Reflection Grating Spectrographs (RGS) on day 21, before the supersoft X-ray source emerged and when the emission was due to shocked ejecta. The absorbed flux in the HETG 1.3-31 Angstrom range was 2.6 x 10(-10) e…
▽ More
The 2021 outburst of the symbiotic recurrent nova RS Oph was observed with the Chandra High Energy Transmission Gratings (HETG) on day 18 after optical maximum and with XMM-Newton and its Reflection Grating Spectrographs (RGS) on day 21, before the supersoft X-ray source emerged and when the emission was due to shocked ejecta. The absorbed flux in the HETG 1.3-31 Angstrom range was 2.6 x 10(-10) erg/cm(-2)/s, three orders of magnitude lower than the gamma-ray flux measured on the same date. The spectra are well fitted with two components of thermal plasma in collisional ionization equilibrium, one at a temperature ~0.75 keV, and the other at temperature in the 2.5-3.4 keV range. With the RGS we measured an average flux 1.53 x 10(-10) erg/cm(-2)/s in the 5-35 Angstrom range, but the flux in the continuum and especially in the lines in the 23-35 Angstrom range decreased during the 50 ks RGS exposure by almost 10%, indicating short term variability on hours' time scale. The RGS spectrum can be fitted with three thermal components, respectively at plasma temperature between 70 and 150 eV, 0.64 keV and 2.4 keV. The post-maximum epochs of the exposures fall between those of two grating spectra observed in the 2006 eruption on days 14 and 26: they are consistent with a similar spectral evolution, but in 2021 cooling seems to have been more rapid. Iron is depleted in the ejecta with respect to solar values, while nitrogen is enhanced.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
From Data to Software to Science with the Rubin Observatory LSST
Authors:
Katelyn Breivik,
Andrew J. Connolly,
K. E. Saavik Ford,
Mario Jurić,
Rachel Mandelbaum,
Adam A. Miller,
Dara Norman,
Knut Olsen,
William O'Mullane,
Adrian Price-Whelan,
Timothy Sacco,
J. L. Sokoloski,
Ashley Villar,
Viviana Acquaviva,
Tomas Ahumada,
Yusra AlSayyad,
Catarina S. Alves,
Igor Andreoni,
Timo Anguita,
Henry J. Best,
Federica B. Bianco,
Rosaria Bonito,
Andrew Bradshaw,
Colin J. Burke,
Andresa Rodrigues de Campos
, et al. (75 additional authors not shown)
Abstract:
The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) dataset will dramatically alter our understanding of the Universe, from the origins of the Solar System to the nature of dark matter and dark energy. Much of this research will depend on the existence of robust, tested, and scalable algorithms, software, and services. Identifying and developing such tools ahead of time has the po…
▽ More
The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) dataset will dramatically alter our understanding of the Universe, from the origins of the Solar System to the nature of dark matter and dark energy. Much of this research will depend on the existence of robust, tested, and scalable algorithms, software, and services. Identifying and developing such tools ahead of time has the potential to significantly accelerate the delivery of early science from LSST. Developing these collaboratively, and making them broadly available, can enable more inclusive and equitable collaboration on LSST science.
To facilitate such opportunities, a community workshop entitled "From Data to Software to Science with the Rubin Observatory LSST" was organized by the LSST Interdisciplinary Network for Collaboration and Computing (LINCC) and partners, and held at the Flatiron Institute in New York, March 28-30th 2022. The workshop included over 50 in-person attendees invited from over 300 applications. It identified seven key software areas of need: (i) scalable cross-matching and distributed joining of catalogs, (ii) robust photometric redshift determination, (iii) software for determination of selection functions, (iv) frameworks for scalable time-series analyses, (v) services for image access and reprocessing at scale, (vi) object image access (cutouts) and analysis at scale, and (vii) scalable job execution systems.
This white paper summarizes the discussions of this workshop. It considers the motivating science use cases, identified cross-cutting algorithms, software, and services, their high-level technical specifications, and the principles of inclusive collaborations needed to develop them. We provide it as a useful roadmap of needs, as well as to spur action and collaboration between groups and individuals looking to develop reusable software for early LSST science.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis
Authors:
Can Firtina,
Kamlesh Pillai,
Gurpreet S. Kalsi,
Bharathwaj Suresh,
Damla Senol Cali,
Jeremie Kim,
Taha Shahroodi,
Meryem Banu Cavlak,
Joel Lindegger,
Mohammed Alser,
Juan Gómez Luna,
Sreenivas Subramoney,
Onur Mutlu
Abstract:
Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highl…
▽ More
Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs.
We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out negligible computations using a hardware-based filter, and 4) minimizing redundant computations.
ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and 27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x - 1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency by 64.24x - 115.46x, 1.75x, 1.96x.
△ Less
Submitted 21 October, 2023; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Connections and Finsler geometry of the structure group of a JB-algebra
Authors:
Gabriel Larotonda,
José Luna
Abstract:
We endow the Banach-Lie structure group $Str(V)$ of an infinite dimensional JB-algebra $V$ with a left-invariant connection and Finsler metric, and we compute all the quantities of its connection. We show how this connection reduces to $G(Ω)$, the group of transformations that preserve the positive cone $Ω$ of the algebra $V$, and to $Aut(V)$, the group of Jordan automorphisms of the algebra. We p…
▽ More
We endow the Banach-Lie structure group $Str(V)$ of an infinite dimensional JB-algebra $V$ with a left-invariant connection and Finsler metric, and we compute all the quantities of its connection. We show how this connection reduces to $G(Ω)$, the group of transformations that preserve the positive cone $Ω$ of the algebra $V$, and to $Aut(V)$, the group of Jordan automorphisms of the algebra. We present the cone $Ω$ as an homogeneous space for the action of $G(Ω)$, therefore inducing a quotient Finsler metric and distance. With the techniques introduced, we prove the minimality of the one-parameter groups in $Ω$ for any symmetric gauge norm in $V$. We establish that the two presentations of the Finsler metric in $Ω$ give the same distance there, which helps us prove the minimality of certain paths in $G(Ω)$ for its left-invariant Finsler metric.
△ Less
Submitted 3 September, 2024; v1 submitted 18 June, 2022;
originally announced June 2022.
-
On the structure group of an infinite dimensional JB-algebra
Authors:
Gabriel Larotonda,
José Luna
Abstract:
We extend several results for the structure group of a real Jordan algebra $V$, to the setting of infinite dimensional JB-algebras. We prove that the structure group $Str(V)$, the cone preserving group $G(Ω)$ and the automorphism group $Aut(V)$ of the algebra $V$ are embedded Banach-Lie groups of $GL(V)$, and that each of the inclusions $Aut(V)\subset G(Ω)\subset Str(V)$ are of embedded Banach-Lie…
▽ More
We extend several results for the structure group of a real Jordan algebra $V$, to the setting of infinite dimensional JB-algebras. We prove that the structure group $Str(V)$, the cone preserving group $G(Ω)$ and the automorphism group $Aut(V)$ of the algebra $V$ are embedded Banach-Lie groups of $GL(V)$, and that each of the inclusions $Aut(V)\subset G(Ω)\subset Str(V)$ are of embedded Banach-Lie subgroups. We give a full description of the components of $Str(V)$ via cones, isotopes and central projections. We apply these results to $V=B(H)_{sa}$ the special JB-algebra of self-adjoint operators on an infinite dimensional complex Hilbert space, describing the groups $Str(V), G(Ω), Aut(V)$, their Banach-Lie algebras and their connected components. We show that the action of the unitary group of $H$ on $Aut(V)$ has smooth local cross sections, thus $Aut(V)$ is a smooth principal bundle over the unitary group, with circle structure group.
△ Less
Submitted 18 September, 2022; v1 submitted 10 June, 2022;
originally announced June 2022.
-
Discovery of the most luminous quasar of the last 9 Gyr
Authors:
Christopher A. Onken,
Samuel Lai,
Christian Wolf,
Adrian B. Lucy,
Wei Jeat Hon,
Patrick Tisserand,
Jennifer L. Sokoloski,
Gerardo J. M. Luna,
Rajeev Manick,
Xiaohui Fan,
Fuyan Bian
Abstract:
We report the discovery of a bright (g = 14.5 mag (AB), K = 11.9 mag (Vega)) quasar at redshift z = 0.83 -- the optically brightest (unbeamed) quasar at z > 0.4. SMSS J114447.77-430859.3, at a Galactic latitude of b = +18.1deg, was identified by its optical colours from the SkyMapper Southern Survey (SMSS) during a search for symbiotic binary stars. Optical and near-infrared spectroscopy reveals b…
▽ More
We report the discovery of a bright (g = 14.5 mag (AB), K = 11.9 mag (Vega)) quasar at redshift z = 0.83 -- the optically brightest (unbeamed) quasar at z > 0.4. SMSS J114447.77-430859.3, at a Galactic latitude of b = +18.1deg, was identified by its optical colours from the SkyMapper Southern Survey (SMSS) during a search for symbiotic binary stars. Optical and near-infrared spectroscopy reveals broad MgII, H-beta, H-alpha, and Pa-beta emission lines, from which we measure a black hole mass of log10(M_BH/M_Sun) = 9.4 +/- 0.5. With its high luminosity, L_bol = (4.7 +/- 1.0) * 10^47 erg/s or M_i(z=2) = -29.74 mag (AB), we estimate an Eddington ratio of ~1.4. As the most luminous quasar known over the last ~9 Gyr of cosmic history, having a luminosity 8 times greater than 3C 273, the source offers a range of potential follow-up opportunities.
△ Less
Submitted 1 August, 2022; v1 submitted 8 June, 2022;
originally announced June 2022.
-
PiDRAM: An FPGA-based Framework for End-to-end Evaluation of Processing-in-DRAM Techniques
Authors:
Ataberk Olgun,
Juan Gomez Luna,
Konstantinos Kanellopoulos,
Behzad Salami,
Hasan Hassan,
Oguz Ergin,
Onur Mutlu
Abstract:
DRAM-based main memory is used in nearly all computing systems as a major component. One way of overcoming the main memory bottleneck is to move computation near memory, a paradigm known as processing-in-memory (PiM). Recent PiM techniques provide a promising way to improve the performance and energy efficiency of existing and future systems at no additional DRAM hardware cost.
We develop the Pr…
▽ More
DRAM-based main memory is used in nearly all computing systems as a major component. One way of overcoming the main memory bottleneck is to move computation near memory, a paradigm known as processing-in-memory (PiM). Recent PiM techniques provide a promising way to improve the performance and energy efficiency of existing and future systems at no additional DRAM hardware cost.
We develop the Processing-in-DRAM (PiDRAM) framework, the first flexible, end-to-end, and open source framework that enables system integration studies and evaluation of real PiM techniques using real DRAM chips. We demonstrate a prototype of PiDRAM on an FPGA-based platform (Xilinx ZC706) that implements an open-source RISC-V system (Rocket Chip). To demonstrate the flexibility and ease of use of PiDRAM, we implement two PiM techniques: (1) RowClone, an in-DRAM copy and initialization mechanism (using command sequences proposed by ComputeDRAM), and (2) D-RaNGe, an in-DRAM true random number generator based on DRAM activation-latency failures.
Our end-to-end evaluation of RowClone shows up to 14.6X speedup for copy and 12.6X initialization operations over CPU copy (i.e., conventional memcpy) and initialization (i.e., conventional calloc) operations. Our implementation of D-RaNGe provides high throughput true random numbers, reaching 8.30 Mb/s throughput. Over the Verilog and C++ basis provided by PiDRAM, implementing the required hardware and software components, implementing RowClone end-to-end takes 198 (565) and implementing D-RaNGe end-to-end takes 190 (78) lines of Verilog (C++) code. PiDRAM is open sourced on Github: https://github.com/CMU-SAFARI/PiDRAM.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory
Authors:
Safaa Diab,
Amir Nassereldine,
Mohammed Alser,
Juan Gómez Luna,
Onur Mutlu,
Izzat El Hajj
Abstract:
We show that the wavefront algorithm can achieve higher pairwise read alignment throughput on a UPMEM PIM system than on a server-grade multi-threaded CPU system.
We show that the wavefront algorithm can achieve higher pairwise read alignment throughput on a UPMEM PIM system than on a server-grade multi-threaded CPU system.
△ Less
Submitted 23 April, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
NICER monitoring of supersoft X-ray sources
Authors:
M. Orio,
K. Gendreau,
M. Giese,
J. G. M. Luna,
J. Magdolen,
S. Pei,
B. Sun,
E. Behar,
A. Dobrotka,
J. Mikolajewska,
D. R. Pasham,
T. E. Strohmayer
Abstract:
We monitored four supersoft sources - two persistent ones, CAL 83 and MR Vel, and the recent novae YZ Ret (Nova Ret 2020) and V1674 Her (Nova Her 2021) - with NICER. The two persistent SSS were observed with unvaried X-ray flux level and spectrum, respectively, 13 and 20 years after the last observations. Short period modulations of the supersoft X-ray source (SSS) appear where the spectrum of the…
▽ More
We monitored four supersoft sources - two persistent ones, CAL 83 and MR Vel, and the recent novae YZ Ret (Nova Ret 2020) and V1674 Her (Nova Her 2021) - with NICER. The two persistent SSS were observed with unvaried X-ray flux level and spectrum, respectively, 13 and 20 years after the last observations. Short period modulations of the supersoft X-ray source (SSS) appear where the spectrum of the luminous central source was fully visibl (in CAL 83 and V1674 Her) and were absent in YZ Ret and MR Vel, in which the flux originated in photoionized or shocked plasma, while the white dwarf (WD) was not observable. We thus suggest that the pulsations occur on, or very close to, the WD surface. The pulsations of CAL 83 were almost unvaried after 15 years, including an irregular drift of the $\simeq$67 s period by 2.1 s. Simulations, including previous XMM-Newton data, indicate actual variations in period length within hours, rather than an artifact of the variable amplitude of the pulsations. Large amplitude pulsations with a period of 501.53$\pm$0.30 s were always detected in V1674 Her, as long as the SSS was observable. This period seems to be due to rotation of a highly magnetized WD.We cannot confirm the maximum effective temperature of ($\simeq$145,000 K) previously inferred for this nova, and discuss the difficulty in interpreting its spectrum. The WD appears to present two surface zones, one of which does not emit SSS flux.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
A Compiler Framework for Optimizing Dynamic Parallelism on GPUs
Authors:
Mhd Ghaith Olabi,
Juan Gómez Luna,
Onur Mutlu,
Wen-mei Hwu,
Izzat El Hajj
Abstract:
Dynamic parallelism on GPUs allows GPU threads to dynamically launch other GPU threads. It is useful in applications with nested parallelism, particularly where the amount of nested parallelism is irregular and cannot be predicted beforehand. However, prior works have shown that dynamic parallelism may impose a high performance penalty when a large number of small grids are launched. The large num…
▽ More
Dynamic parallelism on GPUs allows GPU threads to dynamically launch other GPU threads. It is useful in applications with nested parallelism, particularly where the amount of nested parallelism is irregular and cannot be predicted beforehand. However, prior works have shown that dynamic parallelism may impose a high performance penalty when a large number of small grids are launched. The large number of launches results in high launch latency due to congestion, and the small grid sizes result in hardware underutilization.
To address this issue, we propose a compiler framework for optimizing the use of dynamic parallelism in applications with nested parallelism. The framework features three key optimizations: thresholding, coarsening, and aggregation. Thresholding involves launching a grid dynamically only if the number of child threads exceeds some threshold, and serializing the child threads in the parent thread otherwise. Coarsening involves executing the work of multiple thread blocks by a single coarsened block to amortize the common work across them. Aggregation involves combining multiple child grids into a single aggregated grid.
Our evaluation shows that our compiler framework improves the performance of applications with nested parallelism by a geometric mean of 43.0x over applications that use dynamic parallelism, 8.7x over applications that do not use dynamic parallelism, and 3.6x over applications that use dynamic parallelism with aggregation alone as proposed in prior work.
△ Less
Submitted 8 January, 2022;
originally announced January 2022.
-
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM
Authors:
Ataberk Olgun,
Juan Gómez Luna,
Konstantinos Kanellopoulos,
Behzad Salami,
Hasan Hassan,
Oğuz Ergin,
Onur Mutlu
Abstract:
Processing-using-memory (PuM) techniques leverage the analog operation of memory cells to perform computation. Several recent works have demonstrated PuM techniques in off-the-shelf DRAM devices. Since DRAM is the dominant memory technology as main memory in current computing systems, these PuM techniques represent an opportunity for alleviating the data movement bottleneck at very low cost. Howev…
▽ More
Processing-using-memory (PuM) techniques leverage the analog operation of memory cells to perform computation. Several recent works have demonstrated PuM techniques in off-the-shelf DRAM devices. Since DRAM is the dominant memory technology as main memory in current computing systems, these PuM techniques represent an opportunity for alleviating the data movement bottleneck at very low cost. However, system integration of PuM techniques imposes non-trivial challenges that are yet to be solved. Design space exploration of potential solutions to the PuM integration challenges requires appropriate tools to develop necessary hardware and software components. Unfortunately, current specialized DRAM-testing platforms, or system simulators do not provide the flexibility and/or the holistic system view that is necessary to deal with PuM integration challenges.
We design and develop PiDRAM, the first flexible end-to-end framework that enables system integration studies and evaluation of real PuM techniques. PiDRAM provides software and hardware components to rapidly integrate PuM techniques across the whole system software and hardware stack (e.g., necessary modifications in the operating system, memory controller). We implement PiDRAM on an FPGA-based platform along with an open-source RISC-V system. Using PiDRAM, we implement and evaluate two state-of-the-art PuM techniques: in-DRAM (i) copy and initialization, (ii) true random number generation. Our results show that the in-memory copy and initialization techniques can improve the performance of bulk copy operations by 12.6x and bulk initialization operations by 14.6x on a real system. Implementing the true random number generator requires only 190 lines of Verilog and 74 lines of C code using PiDRAM's software and hardware components.
△ Less
Submitted 4 September, 2023; v1 submitted 29 October, 2021;
originally announced November 2021.
-
The Remarkable Spin-down and Ultra-fast Outflows of the Highly-Pulsed Supersoft Source of Nova Hercules 2021
Authors:
Jeremy J. Drake,
Jan-Uwe Ness,
Kim L. Page,
G. J. M. Luna,
Andrew P. Beardmore,
Marina Orio,
Julian P. Osborne,
Przemek Mroz,
Sumner Starrfield,
Dipankar P. K. Banerjee,
Solen Balman,
M. J. Darnley,
Y. Bhargava,
G. C. Dewangan,
K. P. Singh
Abstract:
Nova Her 2021 (V1674 Her), which erupted on 2021 June 12, reached naked-eye brightness and has been detected from radio to $γ$-rays. An extremely fast optical decline of 2 magnitudes in 1.2 days and strong Ne lines imply a high-mass white dwarf. The optical pre-outburst detection of a 501.42s oscillation suggests a magnetic white dwarf. This is the first time that an oscillation of this magnitude…
▽ More
Nova Her 2021 (V1674 Her), which erupted on 2021 June 12, reached naked-eye brightness and has been detected from radio to $γ$-rays. An extremely fast optical decline of 2 magnitudes in 1.2 days and strong Ne lines imply a high-mass white dwarf. The optical pre-outburst detection of a 501.42s oscillation suggests a magnetic white dwarf. This is the first time that an oscillation of this magnitude has been detected in a classical nova prior to outburst. We report X-ray outburst observations from {\it Swift} and {\it Chandra} which uniquely show: (1) a very strong modulation of super-soft X-rays at a different period from reported optical periods; (2) strong pulse profile variations and the possible presence of period variations of the order of 0.1-0.3s; and (3) rich grating spectra that vary with modulation phase and show P Cygni-type emission lines with two dominant blue-shifted absorption components at $\sim 3000$ and 9000 km s$^{-1}$ indicating expansion velocities up to 11000 km s$^{-1}$. X-ray oscillations most likely arise from inhomogeneous photospheric emission related to the magnetic field. Period differences between reported pre- and post-outburst optical observations, if not due to other period drift mechanisms, suggest a large ejected mass for such a fast nova, in the range $2\times 10^{-5}$-$2\times 10^{-4} M_\odot$. A difference between the period found in the {\it Chandra} data and a reported contemporaneous post-outburst optical period, as well as the presence of period drifts, could be due to weakly non-rigid photospheric rotation.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Expanding Bipolar X-ray Structure After the 2006 Eruption of RS Oph
Authors:
R. Montez Jr.,
G. J. M. Luna,
K. Mukai,
J. Sokoloski,
J. H. Kastner
Abstract:
We report on the detection and analysis of extended X-ray emission by the {\it Chandra} X-ray Observatory stemming from the 2006 eruption of the recurrent novae RS Oph. The extended emission was detected 1254 and 1927 days after the start of the 2006 eruption and is consistent with a bipolar flow oriented in the east-west direction of the sky with opening angles of approximately $70^{\circ}$. The…
▽ More
We report on the detection and analysis of extended X-ray emission by the {\it Chandra} X-ray Observatory stemming from the 2006 eruption of the recurrent novae RS Oph. The extended emission was detected 1254 and 1927 days after the start of the 2006 eruption and is consistent with a bipolar flow oriented in the east-west direction of the sky with opening angles of approximately $70^{\circ}$. The length of both lobes appeared to expand from 1.3 arcsec in 2009 to 2.0 arcsec in 2011, suggesting a projected expansion rate of $1.1\pm0.1 {\rm ~mas~day}^{-1}$ and an expansion velocity of $4600\ {\rm km~s}^{-1}\ (D/2.4\ {\rm kpc})$ in the plane of the sky. This expansion rate is consistent with previous estimates from optical and radio observations of material in a similar orientation. The X-ray emission does not show any evidence of cooling between 2009 and 2011, consistent with free expansion of the material. This discovery suggests that some mechanism collimates ejecta away from the equatorial plane, and that after that material passes through the red-giant wind, it expands freely into the cavity left by the 1985 eruption. We expect similar structures to arise from latest eruption and to expand into the cavity shaped by the 2006 eruption.
△ Less
Submitted 8 October, 2021;
originally announced October 2021.
-
Empirically Determining Substellar Cloud Compositions in the era of JWST
Authors:
Jessica L. Luna,
Caroline V. Morley
Abstract:
Most brown dwarfs have atmospheres with temperatures cold enough to form clouds. A variety of materials likely condense, including refractory metal oxides and silicates; the precise compositions and crystal structures of predicted cloud particles depend on the modeling framework used and have not yet been empirically constrained. Spitzer has shown tentative evidence of the silicate feature in L dw…
▽ More
Most brown dwarfs have atmospheres with temperatures cold enough to form clouds. A variety of materials likely condense, including refractory metal oxides and silicates; the precise compositions and crystal structures of predicted cloud particles depend on the modeling framework used and have not yet been empirically constrained. Spitzer has shown tentative evidence of the silicate feature in L dwarf spectra and JWST can measure these features in many L dwarfs. Here, we present new models to predict the signatures of the strongest cloud absorption features. We investigate different cloud mineral species and determine how particle size, mineralogy, and crystalline structure change spectral features. We find that silicate and refractory clouds have a strong cloud absorption feature for small particle sizes ($\leq$ 1 $μ$m). Model spectra are compared to five brown dwarfs that show evidence of the silicate feature; models that include small particles in the upper layers of the atmosphere produce a broad cloud mineral feature, and that better match the observed spectra than the Ackerman & Marley (2001) cloud model. We simulate observations with the MIRI instrument on JWST for a range of nearby, cloudy brown dwarfs, demonstrating that these features could be readily detectable if small particles are present. Furthermore, for photometrically variable brown dwarfs, our predictions suggest that with JWST, by measuring spectroscopic variability inside and outside a mineral feature, we can establish silicate (or other) clouds as the cause of variability. Mid-infrared spectroscopy is a promising tool to empirically constrain the complex cloud condensation sequence in brown dwarf atmospheres.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Breaking the degeneracy in magnetic cataclysmic variable X-ray spectral modeling using X-ray light curves
Authors:
Diogo Belloni,
Claudia V. Rodrigues,
Matthias R. Schreiber,
Manuel Castro,
Joaquim E. R. Costa,
Takayuki Hayashi,
Isabel J. Lima,
Gerardo J. M. Luna,
Murilo Martins,
Alexandre S. Oliveira,
Steven G. Parsons,
Karleyne M. G. Silva,
Paulo E. Stecchini,
Teresa J. Stuchi,
Monica Zorotovic
Abstract:
We present an analysis of mock X-ray spectra and light curves of magnetic cataclysmic variables using an upgraded version of the 3D CYCLOPS code. This 3D representation of the accretion flow allows us to properly model total and partial occultation of the post-shock region by the white dwarf as well as the modulation of the X-ray light curves due to the phase-dependent extinction of the pre-shock…
▽ More
We present an analysis of mock X-ray spectra and light curves of magnetic cataclysmic variables using an upgraded version of the 3D CYCLOPS code. This 3D representation of the accretion flow allows us to properly model total and partial occultation of the post-shock region by the white dwarf as well as the modulation of the X-ray light curves due to the phase-dependent extinction of the pre-shock region. We carried out detailed post-shock region modeling in a four-dimensional parameter space by varying the white dwarf mass and magnetic field strength as well as the magnetosphere radius and the specific accretion rate. To calculate the post-shock region temperature and density profiles, we assumed equipartition between ions and electrons, took into account the white dwarf gravitational potential, the finite size of the magnetosphere and a dipole-like magnetic field geometry, and considered cooling by both bremsstrahlung and cyclotron radiative processes. By investigating the impact of the parameters on the resulting X-ray continuum spectra, we show that there is an inevitable degeneracy in the four-dimensional parameter space investigated here, which compromises X-ray continuum spectral fitting strategies and can lead to incorrect parameter estimates. However, the inclusion of X-ray light curves in different energy ranges can break this degeneracy, and it therefore remains, in principle, possible to use X-ray data to derive fundamental parameters of magnetic cataclysmic variables, which represents an essential step toward understanding their formation and evolution.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors
Authors:
Jawad Haj-Yahya,
Jeremie S. Kim,
A. Giray Yaglikci,
Ivan Puddu,
Lois Orosa,
Juan Gómez Luna,
Mohammed Alser,
Onur Mutlu
Abstract:
To operate efficiently across a wide range of workloads with varying power requirements, a modern processor applies different current management mechanisms, which briefly throttle instruction execution while they adjust voltage and frequency to accommodate for power-hungry instructions (PHIs) in the instruction stream. Doing so 1) reduces the power consumption of non-PHI instructions in typical wo…
▽ More
To operate efficiently across a wide range of workloads with varying power requirements, a modern processor applies different current management mechanisms, which briefly throttle instruction execution while they adjust voltage and frequency to accommodate for power-hungry instructions (PHIs) in the instruction stream. Doing so 1) reduces the power consumption of non-PHI instructions in typical workloads and 2) optimizes system voltage regulators' cost and area for the common use case while limiting current consumption when executing PHIs.
However, these mechanisms may compromise a system's confidentiality guarantees. In particular, we observe that multilevel side-effects of throttling mechanisms, due to PHI-related current management mechanisms, can be detected by two different software contexts (i.e., sender and receiver) running on 1) the same hardware thread, 2) co-located Simultaneous Multi-Threading (SMT) threads, and 3) different physical cores.
Based on these new observations on current management mechanisms, we develop a new set of covert channels, IChannels, and demonstrate them in real modern Intel processors (which span more than 70% of the entire client and server processor market). Our analysis shows that IChannels provides more than 24x the channel capacity of state-of-the-art power management covert channels. We propose practical and effective mitigations to each covert channel in IChannels by leveraging the insights we gain through a rigorous characterization of real systems.
△ Less
Submitted 10 June, 2021; v1 submitted 9 June, 2021;
originally announced June 2021.
-
SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM
Authors:
Nastaran Hajinazar,
Geraldo F. Oliveira,
Sven Gregorio,
João Ferreira,
Nika Mansouri Ghiasi,
Minesh Patel,
Mohammed Alser,
Saugata Ghose,
Juan Gómez Luna,
Onur Mutlu
Abstract:
Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that (1) enables the efficient implementation of complex ope…
▽ More
Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that (1) enables the efficient implementation of complex operations, and (2) provides a flexible mechanism to support the implementation of arbitrary user-defined operations. The SIMDRAM framework comprises three key steps. The first step builds an efficient MAJ/NOT representation of a given desired operation. The second step allocates DRAM rows that are reserved for computation to the operation's input and output operands, and generates the required sequence of DRAM commands to perform the MAJ/NOT implementation of the desired operation in DRAM. The third step uses the SIMDRAM control unit located inside the memory controller to manage the computation of the operation from start to end, by executing the DRAM commands generated in the second step of the framework. We design the hardware and ISA support for SIMDRAM framework to (1) address key system integration challenges, and (2) allow programmers to employ new SIMDRAM operations without hardware changes.
We evaluate SIMDRAM for reliability, area overhead, throughput, and energy efficiency using a wide range of operations and seven real-world applications to demonstrate SIMDRAM's generality. Using 16 DRAM banks, SIMDRAM provides (1) 88x and 5.8x the throughput, and 257x and 31x the energy efficiency, of a CPU and a high-end GPU, respectively, over 16 operations; (2) 21x and 2.1x the performance of the CPU and GPU, over seven real-world applications. SIMDRAM incurs an area overhead of only 0.2% in a high-end CPU.
△ Less
Submitted 30 June, 2021; v1 submitted 26 May, 2021;
originally announced May 2021.
-
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems
Authors:
Maciej Besta,
Raghavendra Kanakagiri,
Grzegorz Kwasniewski,
Rachata Ausavarungnirun,
Jakub Beránek,
Konstantinos Kanellopoulos,
Kacper Janda,
Zur Vonarburg-Shmaria,
Lukas Gianinazzi,
Ioana Stefan,
Juan Gómez Luna,
Marcin Copik,
Lukas Kapp-Schwoerer,
Salvatore Di Girolamo,
Marek Konieczny,
Nils Blach,
Onur Mutlu,
Torsten Hoefler
Abstract:
Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex graph mining algorithms for problems such as clustering or maximal clique listing. These algorithms are memory-bound and thus could be accelerated by hardware techniques such as Processing-in-Memory (PIM). However, they also come with nonstraightforward paralleli…
▽ More
Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex graph mining algorithms for problems such as clustering or maximal clique listing. These algorithms are memory-bound and thus could be accelerated by hardware techniques such as Processing-in-Memory (PIM). However, they also come with nonstraightforward parallelism and complicated memory access patterns. In this work, we address this problem with a simple yet surprisingly powerful observation: operations on sets of vertices, such as intersection or union, form a large part of many complex graph mining algorithms, and can offer rich and simple parallelism at multiple levels. This observation drives our cross-layer design, in which we (1) expose set operations using a novel programming paradigm, (2) express and execute these operations efficiently with carefully designed set-centric ISA extensions called SISA, and (3) use PIM to accelerate SISA instructions. The key design idea is to alleviate the bandwidth needs of SISA instructions by mapping set operations to two types of PIM: in-DRAM bulk bitwise computing for bitvectors representing high-degree vertices, and near-memory logic layers for integer arrays representing low-degree vertices. Set-centric SISA-enhanced algorithms are efficient and outperform hand-tuned baselines, offering more than 10x speedup over the established Bron-Kerbosch algorithm for listing maximal cliques. We deliver more than 10 SISA set-centric algorithm formulations, illustrating SISA's wide applicability.
△ Less
Submitted 25 October, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
BurstLink: Techniques for Energy-Efficient Conventional and Virtual Reality Video Display
Authors:
Jawad Haj-Yahya,
Jisung Park,
Rahul Bera,
Juan Gómez Luna,
Efraim Rotem,
Taha Shahroodi,
Jeremie Kim,
Onur Mutlu
Abstract:
Conventional planar video streaming is the most popular application in mobile systems and the rapid growth of 360 video content and virtual reality (VR) devices are accelerating the adoption of VR video streaming. Unfortunately, video streaming consumes significant system energy due to the high power consumption of the system components (e.g., DRAM, display interfaces, and display panel) involved…
▽ More
Conventional planar video streaming is the most popular application in mobile systems and the rapid growth of 360 video content and virtual reality (VR) devices are accelerating the adoption of VR video streaming. Unfortunately, video streaming consumes significant system energy due to the high power consumption of the system components (e.g., DRAM, display interfaces, and display panel) involved in this process.
We propose BurstLink, a novel system-level technique that improves the energy efficiency of planar and VR video streaming. BurstLink is based on two key ideas. First, BurstLink directly transfers a decoded video frame from the host system to the display panel, bypassing the host DRAM. To this end, we extend the display panel with a double remote frame buffer (DRFB), instead of the DRAM's double frame buffer, so that the system can directly update the DRFB with a new frame while updating the panel's pixels with the current frame stored in the DRFB. Second, BurstLink transfers a complete decoded frame to the display panel in a single burst, using the maximum bandwidth of modern display interfaces. Unlike conventional systems where the frame transfer rate is limited by the pixel-update throughput of the display panel, BurstLink can always take full advantage of the high bandwidth of modern display interfaces by decoupling the frame transfer from the pixel update as enabled by the DRFB. This direct and burst frame transfer of BurstLink significantly reduces energy consumption in video display by reducing access to the host DRAM and increasing the system's residency at idle power states.
We evaluate BurstLink using an analytical power model that we rigorously validate on a real modern mobile system. Our evaluation shows that BurstLink reduces system energy consumption for 4K planar and VR video streaming by 41% and 33%, respectively.
△ Less
Submitted 1 November, 2021; v1 submitted 11 April, 2021;
originally announced April 2021.
-
IGRINS RV: A Precision RV Pipeline for IGRINS Using Modified Forward Modeling in the Near-Infrared
Authors:
Asa G. Stahl,
Shih-Yun Tang,
Christopher M. Johns-Krull,
L. Prato,
Joe Llama,
Gregory N. Mace,
Jae Joon Lee,
Heeyoung Oh,
Jessica Luna,
Daniel T. Jaffe
Abstract:
Application of the radial velocity (RV) technique in the near infrared is valuable because of the diminished impact of stellar activity at longer wavelengths, making it particularly advantageous for the study of late-type stars but also for solar-type objects. In this paper, we present the IGRINS RV open source python pipeline for computing infrared RV measurements from reduced spectra taken with…
▽ More
Application of the radial velocity (RV) technique in the near infrared is valuable because of the diminished impact of stellar activity at longer wavelengths, making it particularly advantageous for the study of late-type stars but also for solar-type objects. In this paper, we present the IGRINS RV open source python pipeline for computing infrared RV measurements from reduced spectra taken with IGRINS, a R ~ 45,000 spectrograph with simultaneous coverage of the H band (1.49--1.80 $μ$m) and K band (1.96--2.46 $μ$m). Using a modified forward modeling technique, we construct high resolution telluric templates from A0 standard observations on a nightly basis to provide a source of common-path wavelength calibration while mitigating the need to mask or correct for telluric absorption. Telluric standard observations are also used to model the variations in instrumental resolution across the detector, including a yearlong period when the K band was defocused. Without any additional instrument hardware, such as a gas cell or laser frequency comb, we are able to achieve precisions of 26.8 $\rm m\,s^{-1}$ in the K band and 31.1 $\rm m\,s^{-1}$ in the H band for narrow-line hosts. These precisions are empirically determined by a monitoring campaign of two RV standard stars as well as the successful retrieval of planet-induced RV signals for both HD 189733 and $τ$ Boo A; furthermore, our results affirm the presence of the Rossiter-McLaughlin effect for HD 189733. The IGRINS RV pipeline extends another important science capability to IGRINS, with publicly available software designed for widespread use.
△ Less
Submitted 7 April, 2021; v1 submitted 5 April, 2021;
originally announced April 2021.