Skip to main content

Showing 1–9 of 9 results for author: Tulloch, A

  1. arXiv:2305.01515  [pdf, other

    cs.IR cs.LG cs.PF

    MTrainS: Improving DLRM training efficiency using heterogeneous memories

    Authors: Hiwot Tadese Kassa, Paul Johnson, Jason Akers, Mrinmoy Ghosh, Andrew Tulloch, Dheevatsa Mudigere, Jongsoo Park, Xing Liu, Ronald Dreslinski, Ehsan K. Ardestani

    Abstract: Recommendation models are very large, requiring terabytes (TB) of memory during training. In pursuit of better quality, the model size and complexity grow over time, which requires additional training data to avoid overfitting. This model growth demands a large number of resources in data centers. Hence, training efficiency is becoming considerably more important to keep the data center power dema… ▽ More

    Submitted 19 April, 2023; originally announced May 2023.

  2. arXiv:2104.05158  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

    Authors: Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng , et al. (28 additional authors not shown)

    Abstract: Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pa… ▽ More

    Submitted 26 February, 2023; v1 submitted 11 April, 2021; originally announced April 2021.

  3. arXiv:2010.11305  [pdf, other

    cs.LG cs.AI cs.DC

    Mixed-Precision Embedding Using a Cache

    Authors: Jie Amy Yang, Jianyu Huang, Jongsoo Park, Ping Tak Peter Tang, Andrew Tulloch

    Abstract: In recommendation systems, practitioners observed that increase in the number of embedding tables and their sizes often leads to significant improvement in model performances. Given this and the business importance of these models to major internet companies, embedding tables for personalization tasks have grown to terabyte scale and continue to grow at a significant rate. Meanwhile, these large-s… ▽ More

    Submitted 22 October, 2020; v1 submitted 21 October, 2020; originally announced October 2020.

  4. arXiv:1911.08609  [pdf, other

    cs.CV

    Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition

    Authors: Bing Xu, Andrew Tulloch, Yunpeng Chen, Xiaomeng Yang, Lin Qiao

    Abstract: We propose a new building block, IdleBlock, which naturally prunes connections within the block. To fully utilize the IdleBlock we break the tradition of monotonic design in state-of-the-art networks, and introducing hybrid composition with IdleBlock. We study hybrid composition on MobileNet v3 and EfficientNet-B0, two of the most efficient networks. Without any neural architecture search, the dee… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

  5. arXiv:1811.09886  [pdf, other

    cs.LG stat.ML

    Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

    Authors: Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, Juan Pino, Martin Schatz, Alexander Sidorov, Viswanath Sivakumar, Andrew Tulloch, Xiaodong Wang, Yiming Wu, Hector Yuen, Utku Diril, Dmytro Dzhulgakov, Kim Hazelwood, Bill Jia, Yangqing Jia, Lin Qiao, Vijay Rao , et al. (3 additional authors not shown)

    Abstract: The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper provides detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high performance optimizations targeting existing systems, point out their limitations and make suggestions… ▽ More

    Submitted 29 November, 2018; v1 submitted 24 November, 2018; originally announced November 2018.

  6. arXiv:1811.09862  [pdf, other

    cs.LG cs.CV stat.ML

    On Periodic Functions as Regularizers for Quantization of Neural Networks

    Authors: Maxim Naumov, Utku Diril, Jongsoo Park, Benjamin Ray, Jedrzej Jablonski, Andrew Tulloch

    Abstract: Deep learning models have been successfully used in computer vision and many other fields. We propose an unorthodox algorithm for performing quantization of the model parameters. In contrast with popular quantization schemes based on thresholds, we use a novel technique based on periodic functions, such as continuous trigonometric sine or cosine as well as non-continuous hat functions. We apply th… ▽ More

    Submitted 24 November, 2018; originally announced November 2018.

    Comments: 11 pages, 7 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.5.0

  7. arXiv:1712.02427  [pdf, other

    cs.LG

    High performance ultra-low-precision convolutions on mobile devices

    Authors: Andrew Tulloch, Yangqing Jia

    Abstract: Many applications of mobile deep learning, especially real-time computer vision workloads, are constrained by computation power. This is particularly true for workloads running on older consumer phones, where a typical device might be powered by a single- or dual-core ARMv7 CPU. We provide an open-source implementation and a comprehensive analysis of (to our knowledge) the state of the art ultra-l… ▽ More

    Submitted 6 December, 2017; originally announced December 2017.

    Comments: Presented at NIPS 2017, Machine Learning on the Phone and other Consumer Devices workshop

  8. arXiv:1706.02677  [pdf, other

    cs.CV cs.DC cs.LG

    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

    Authors: Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He

    Abstract: Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers. Yet to make this scheme efficient, the per-worker workload must be large,… ▽ More

    Submitted 30 April, 2018; v1 submitted 8 June, 2017; originally announced June 2017.

    Comments: Tech report (v2: correct typos)

  9. arXiv:1603.01765  [pdf, ps, other

    math.NA stat.CO

    Accurate principal component analysis via a few iterations of alternating least squares

    Authors: Arthur Szlam, Andrew Tulloch, Mark Tygert

    Abstract: A few iterations of alternating least squares with a random starting point provably suffice to produce nearly optimal spectral- and Frobenius-norm accuracies of low-rank approximations to a matrix; iterating to convergence is unnecessary. Thus, software implementing alternating least squares can be retrofitted via appropriate setting of parameters to calculate nearly optimally accurate low-rank ap… ▽ More

    Submitted 5 March, 2016; originally announced March 2016.

    Comments: 9 pages, 3 tables

    Journal ref: SIAM Journal on Matrix Analysis and Applications, 38 (2): 425-433, 2017