subscribe to arXiv mailings

OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks

Authors: Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty

Abstract: The size of deep neural networks has grown exponentially in recent years. Unfortunately, hardware devices have not kept pace with the rapidly increasing memory requirements. To cope with this, researchers have turned to techniques such as spilling and recomputation, which increase training time, or reduced precision and model pruning, which can affect model accuracy. We present OLLA, an algorithm… ▽ More The size of deep neural networks has grown exponentially in recent years. Unfortunately, hardware devices have not kept pace with the rapidly increasing memory requirements. To cope with this, researchers have turned to techniques such as spilling and recomputation, which increase training time, or reduced precision and model pruning, which can affect model accuracy. We present OLLA, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks. Our method reduces the memory usage of existing neural networks, without needing any modification to the models or their training procedures. We formulate the problem as a joint integer linear program (ILP). We present several techniques to simplify the encoding of the problem, and enable our approach to scale to the size of state-of-the-art neural networks using an off-the-shelf ILP solver. We experimentally demonstrate that OLLA only takes minutes if not seconds to allow the training of neural networks using one-third less memory on average. △ Less

Submitted 2 November, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

arXiv:2110.12106 [pdf, other]

HWTool: Fully Automatic Mapping of an Extensible C++ Image Processing Language to Hardware

Authors: James Hegarty, Omar Eldash, Amr Suleiman, Armin Alaghi

Abstract: Implementing image processing algorithms using FPGAs or ASICs can improve energy efficiency by orders of magnitude over optimized CPU, DSP, or GPU code. These efficiency improvements are crucial for enabling new applications on mobile power-constrained devices, such as cell phones or AR/VR headsets. Unfortunately, custom hardware is commonly implemented using a waterfall process with time-intensiv… ▽ More Implementing image processing algorithms using FPGAs or ASICs can improve energy efficiency by orders of magnitude over optimized CPU, DSP, or GPU code. These efficiency improvements are crucial for enabling new applications on mobile power-constrained devices, such as cell phones or AR/VR headsets. Unfortunately, custom hardware is commonly implemented using a waterfall process with time-intensive manual mapping and optimization phases. Thus, it can take years for a new algorithm to make it all the way from an algorithm design to shipping silicon. Recent improvements in hardware design tools, such as C-to-gates High-Level Synthesis (HLS), can reduce design time, but still require manual tuning from hardware experts. In this paper, we present HWTool, a novel system for automatically mapping image processing and computer vision algorithms to hardware. Our system maps between two domains: HWImg, an extensible C++ image processing library containing common image processing and parallel computing operators, and Rigel2, a library of optimized hardware implementations of HWImg's operators and backend Verilog compiler. We show how to automatically compile HWImg to Rigel2, by solving for interfaces, hardware sizing, and FIFO buffer allocation. Finally, we map full-scale image processing applications like convolution, optical flow, depth from stereo, and feature descriptors to FPGA using our system. On these examples, HWTool requires on average only 11% more FPGA area than hand-optimized designs (with manual FIFO allocation), and 33% more FPGA area than hand-optimized designs with automatic FIFO allocation, and performs similarly to HLS. △ Less

Submitted 22 October, 2021; originally announced October 2021.

arXiv:2108.12489 [pdf, ps, other]

Using Graph Neural Networks to model the performance of Deep Neural Networks

Authors: Shikhar Singh, Benoit Steiner, James Hegarty, Hugh Leather

Abstract: With the unprecedented proliferation of machine learning software, there is an ever-increasing need to generate efficient code for such applications. State-of-the-art deep-learning compilers like TVM and Halide incorporate a learning-based performance model to search the space of valid implementations of a given deep learning algorithm. For a given application, the model generates a performance me… ▽ More With the unprecedented proliferation of machine learning software, there is an ever-increasing need to generate efficient code for such applications. State-of-the-art deep-learning compilers like TVM and Halide incorporate a learning-based performance model to search the space of valid implementations of a given deep learning algorithm. For a given application, the model generates a performance metric such as the run time without executing the application on hardware. Such models speed up the compilation process by obviating the need to benchmark an enormous number of candidate implementations, referred to as schedules, on hardware. Existing performance models employ feed-forward networks, recurrent networks, or decision tree ensembles to estimate the performance of different implementations of a neural network. Graphs present a natural and intuitive way to model deep-learning networks where each node represents a computational stage or operation. Incorporating the inherent graph structure of these workloads in the performance model can enable a better representation and learning of inter-stage interactions. The accuracy of a performance model has direct implications on the efficiency of the search strategy, making it a crucial component of this class of deep-learning compilers. In this work, we develop a novel performance model that adopts a graph representation. In our model, each stage of computation represents a node characterized by features that capture the operations performed by the stage. The interaction between nodes is achieved using graph convolutions. Experimental evaluation shows a 7:75x and 12x reduction in prediction error compared to the Halide and TVM models, respectively. △ Less

Submitted 27 August, 2021; originally announced August 2021.

arXiv:2101.08300 [pdf, other]

doi 10.3847/1538-4357/abdeb7

Massive White Dwarfs in Young Star Clusters

Authors: Harvey B. Richer, Ilaria Caiazzo, Helen Du, Steffani Grondin, James Hegarty, Jeremy Heyl, Ronan Kerr, David R. Miller, Sarah Thiele

Abstract: We have carried out a search for massive white dwarfs (WDs) in the direction of young open star clusters using the Gaia DR2 database. The aim of this survey was to provide robust data for new and previously known high-mass WDs regarding cluster membership, to highlight WDs previously included in the Initial Final Mass Relation (IFMR) that are unlikely members of their respective clusters according… ▽ More We have carried out a search for massive white dwarfs (WDs) in the direction of young open star clusters using the Gaia DR2 database. The aim of this survey was to provide robust data for new and previously known high-mass WDs regarding cluster membership, to highlight WDs previously included in the Initial Final Mass Relation (IFMR) that are unlikely members of their respective clusters according to Gaia astrometry and to select an unequivocal WD sample that could then be compared with the host clusters' turnoff masses. All promising WD candidates in each cluster CMD were followed up with spectroscopy from Gemini in order to determine whether they were indeed WDs and derive their masses, temperatures and ages. In order to be considered cluster members, white dwarfs were required to have proper motions and parallaxes within 2, 3, or 4-$σ$ of that of their potential parent cluster based on how contaminated the field was in their region of the sky, have a cooling age that was less than the cluster age and a mass that was broadly consistent with the IFMR. A number of WDs included in current versions of the IFMR turned out to be non-members and a number of apparent members, based on Gaia's astrometric data alone, were rejected as their mass and/or cooling times were incompatible with cluster membership. In this way, we developed a highly selected IFMR sample for high mass WDs that, surprisingly, contained no precursor masses significantly in excess of ${\sim}$6 $M_{\odot}$. △ Less

Submitted 20 January, 2021; originally announced January 2021.

Comments: 39 pages, 17 figures, Accepted for Publication in the Astrophysical Journal

arXiv:2009.03374 [pdf, other]

doi 10.3847/2041-8213/abb5f7

Intermediate-Mass Stars Become Magnetic White Dwarfs

Authors: Ilaria Caiazzo, Jeremy Heyl, Harvey Richer, Jeffrey Cummings, Leesa Fleury, James Hegarty, Jason Kalirai, Ronan Kerr, Sarah Thiele, Pier-Emmanuel Tremblay, Michael Villanueva

Abstract: When a star exhausts its nuclear fuel, it either explodes as a supernova or more quiescently becomes a white dwarf, an object about half the mass of our Sun with a radius of about that of the Earth. About one fifth of white dwarfs exhibit the presence of magnetic fields, whose origin has long been debated as either the product of previous stages of evolution or of binary interactions. We here repo… ▽ More When a star exhausts its nuclear fuel, it either explodes as a supernova or more quiescently becomes a white dwarf, an object about half the mass of our Sun with a radius of about that of the Earth. About one fifth of white dwarfs exhibit the presence of magnetic fields, whose origin has long been debated as either the product of previous stages of evolution or of binary interactions. We here report the discovery of two massive and magnetic white dwarf members of young star clusters in the Gaia DR2 database, while a third massive and magnetic cluster white dwarf was already reported in a previous paper. These stars are most likely the product of single-star evolution and therefore challenge the merger scenario as the only way to produce magnetic white dwarfs. The progenitor masses of these stars are all above 5 solar masses, and there are only two other cluster white dwarfs whose distances have been unambiguously measured with Gaia and whose progenitors' masses fall in this range. This high incidence of magnetic white dwarfs indicates that intermediate-mass progenitors are more likely to produce magnetic remnants and that a fraction of magnetic white dwarfs forms from intermediate-mass stars. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Comments: 11 pages, 7 figures. Accepted by ApJ Letters

Showing 1–5 of 5 results for author: Hegarty, J