-
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Authors:
Benoit Steiner,
Mostafa Elhoushi,
Jacob Kahn,
James Hegarty
Abstract:
The size of deep neural networks has grown exponentially in recent years. Unfortunately, hardware devices have not kept pace with the rapidly increasing memory requirements. To cope with this, researchers have turned to techniques such as spilling and recomputation, which increase training time, or reduced precision and model pruning, which can affect model accuracy. We present OLLA, an algorithm…
▽ More
The size of deep neural networks has grown exponentially in recent years. Unfortunately, hardware devices have not kept pace with the rapidly increasing memory requirements. To cope with this, researchers have turned to techniques such as spilling and recomputation, which increase training time, or reduced precision and model pruning, which can affect model accuracy. We present OLLA, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks. Our method reduces the memory usage of existing neural networks, without needing any modification to the models or their training procedures. We formulate the problem as a joint integer linear program (ILP). We present several techniques to simplify the encoding of the problem, and enable our approach to scale to the size of state-of-the-art neural networks using an off-the-shelf ILP solver. We experimentally demonstrate that OLLA only takes minutes if not seconds to allow the training of neural networks using one-third less memory on average.
△ Less
Submitted 2 November, 2022; v1 submitted 23 October, 2022;
originally announced October 2022.
-
HWTool: Fully Automatic Mapping of an Extensible C++ Image Processing Language to Hardware
Authors:
James Hegarty,
Omar Eldash,
Amr Suleiman,
Armin Alaghi
Abstract:
Implementing image processing algorithms using FPGAs or ASICs can improve energy efficiency by orders of magnitude over optimized CPU, DSP, or GPU code. These efficiency improvements are crucial for enabling new applications on mobile power-constrained devices, such as cell phones or AR/VR headsets. Unfortunately, custom hardware is commonly implemented using a waterfall process with time-intensiv…
▽ More
Implementing image processing algorithms using FPGAs or ASICs can improve energy efficiency by orders of magnitude over optimized CPU, DSP, or GPU code. These efficiency improvements are crucial for enabling new applications on mobile power-constrained devices, such as cell phones or AR/VR headsets. Unfortunately, custom hardware is commonly implemented using a waterfall process with time-intensive manual mapping and optimization phases. Thus, it can take years for a new algorithm to make it all the way from an algorithm design to shipping silicon. Recent improvements in hardware design tools, such as C-to-gates High-Level Synthesis (HLS), can reduce design time, but still require manual tuning from hardware experts.
In this paper, we present HWTool, a novel system for automatically mapping image processing and computer vision algorithms to hardware. Our system maps between two domains: HWImg, an extensible C++ image processing library containing common image processing and parallel computing operators, and Rigel2, a library of optimized hardware implementations of HWImg's operators and backend Verilog compiler. We show how to automatically compile HWImg to Rigel2, by solving for interfaces, hardware sizing, and FIFO buffer allocation. Finally, we map full-scale image processing applications like convolution, optical flow, depth from stereo, and feature descriptors to FPGA using our system. On these examples, HWTool requires on average only 11% more FPGA area than hand-optimized designs (with manual FIFO allocation), and 33% more FPGA area than hand-optimized designs with automatic FIFO allocation, and performs similarly to HLS.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Using Graph Neural Networks to model the performance of Deep Neural Networks
Authors:
Shikhar Singh,
Benoit Steiner,
James Hegarty,
Hugh Leather
Abstract:
With the unprecedented proliferation of machine learning software, there is an ever-increasing need to generate efficient code for such applications. State-of-the-art deep-learning compilers like TVM and Halide incorporate a learning-based performance model to search the space of valid implementations of a given deep learning algorithm. For a given application, the model generates a performance me…
▽ More
With the unprecedented proliferation of machine learning software, there is an ever-increasing need to generate efficient code for such applications. State-of-the-art deep-learning compilers like TVM and Halide incorporate a learning-based performance model to search the space of valid implementations of a given deep learning algorithm. For a given application, the model generates a performance metric such as the run time without executing the application on hardware. Such models speed up the compilation process by obviating the need to benchmark an enormous number of candidate implementations, referred to as schedules, on hardware. Existing performance models employ feed-forward networks, recurrent networks, or decision tree ensembles to estimate the performance of different implementations of a neural network. Graphs present a natural and intuitive way to model deep-learning networks where each node represents a computational stage or operation. Incorporating the inherent graph structure of these workloads in the performance model can enable a better representation and learning of inter-stage interactions. The accuracy of a performance model has direct implications on the efficiency of the search strategy, making it a crucial component of this class of deep-learning compilers. In this work, we develop a novel performance model that adopts a graph representation. In our model, each stage of computation represents a node characterized by features that capture the operations performed by the stage. The interaction between nodes is achieved using graph convolutions. Experimental evaluation shows a 7:75x and 12x reduction in prediction error compared to the Halide and TVM models, respectively.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
Massive White Dwarfs in Young Star Clusters
Authors:
Harvey B. Richer,
Ilaria Caiazzo,
Helen Du,
Steffani Grondin,
James Hegarty,
Jeremy Heyl,
Ronan Kerr,
David R. Miller,
Sarah Thiele
Abstract:
We have carried out a search for massive white dwarfs (WDs) in the direction of young open star clusters using the Gaia DR2 database. The aim of this survey was to provide robust data for new and previously known high-mass WDs regarding cluster membership, to highlight WDs previously included in the Initial Final Mass Relation (IFMR) that are unlikely members of their respective clusters according…
▽ More
We have carried out a search for massive white dwarfs (WDs) in the direction of young open star clusters using the Gaia DR2 database. The aim of this survey was to provide robust data for new and previously known high-mass WDs regarding cluster membership, to highlight WDs previously included in the Initial Final Mass Relation (IFMR) that are unlikely members of their respective clusters according to Gaia astrometry and to select an unequivocal WD sample that could then be compared with the host clusters' turnoff masses.
All promising WD candidates in each cluster CMD were followed up with spectroscopy from Gemini in order to determine whether they were indeed WDs and derive their masses, temperatures and ages. In order to be considered cluster members, white dwarfs were required to have proper motions and parallaxes within 2, 3, or 4-$σ$ of that of their potential parent cluster based on how contaminated the field was in their region of the sky, have a cooling age that was less than the cluster age and a mass that was broadly consistent with the IFMR. A number of WDs included in current versions of the IFMR turned out to be non-members and a number of apparent members, based on Gaia's astrometric data alone, were rejected as their mass and/or cooling times were incompatible with cluster membership. In this way, we developed a highly selected IFMR sample for high mass WDs that, surprisingly, contained no precursor masses significantly in excess of ${\sim}$6 $M_{\odot}$.
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
Intermediate-Mass Stars Become Magnetic White Dwarfs
Authors:
Ilaria Caiazzo,
Jeremy Heyl,
Harvey Richer,
Jeffrey Cummings,
Leesa Fleury,
James Hegarty,
Jason Kalirai,
Ronan Kerr,
Sarah Thiele,
Pier-Emmanuel Tremblay,
Michael Villanueva
Abstract:
When a star exhausts its nuclear fuel, it either explodes as a supernova or more quiescently becomes a white dwarf, an object about half the mass of our Sun with a radius of about that of the Earth. About one fifth of white dwarfs exhibit the presence of magnetic fields, whose origin has long been debated as either the product of previous stages of evolution or of binary interactions. We here repo…
▽ More
When a star exhausts its nuclear fuel, it either explodes as a supernova or more quiescently becomes a white dwarf, an object about half the mass of our Sun with a radius of about that of the Earth. About one fifth of white dwarfs exhibit the presence of magnetic fields, whose origin has long been debated as either the product of previous stages of evolution or of binary interactions. We here report the discovery of two massive and magnetic white dwarf members of young star clusters in the Gaia DR2 database, while a third massive and magnetic cluster white dwarf was already reported in a previous paper. These stars are most likely the product of single-star evolution and therefore challenge the merger scenario as the only way to produce magnetic white dwarfs. The progenitor masses of these stars are all above 5 solar masses, and there are only two other cluster white dwarfs whose distances have been unambiguously measured with Gaia and whose progenitors' masses fall in this range. This high incidence of magnetic white dwarfs indicates that intermediate-mass progenitors are more likely to produce magnetic remnants and that a fraction of magnetic white dwarfs forms from intermediate-mass stars.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.