skip to main content
research-article

TensorQuant: A Simulation Toolbox for Deep Neural Network Quantization

Published: 12 November 2017 Publication History

Abstract

Recent research implies that training and inference of deep neural networks (DNN) can be computed with low precision numerical representations of the training/test data, weights and gradients without a general loss in accuracy. The benefit of such compact representations is twofold: they allow a significant reduction of the communication bottleneck in distributed DNN training and faster neural network implementations on hardware accelerators like FPGAs. Several quantization methods have been proposed to map the original 32-bit floating point problem to low-bit representations. While most related publications validate the proposed approach on a single DNN topology, it appears to be evident, that the optimal choice of the quantization method and number of coding bits is topology dependent. To this end, there is no general theory available, which would allow users to derive the optimal quantization during the design of a DNN topology.
In this paper, we present a quantization tool box for the TensorFlow framework. TensorQuant allows a transparent quantization simulation of existing DNN topologies during training and inference. TensorQuant supports generic quantization methods and allows experimental evaluation of the impact of the quantization on single layers as well as on the full topology. In a first series of experiments with TensorQuant, we show an analysis of fix-point quantizations of popular CNN topologies.

References

[1]
Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In International Conference on Machine Learning. 2285--2294.
[2]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.
[3]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016).
[4]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.
[5]
Vinayak Gokhale, Aliasger Zaidy, Andre Xian Ming Chang, and Eugenio Culurciello. 2017. Snowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks. arXiv preprint arXiv:1708.02579 (2017).
[6]
Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014).
[7]
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1737--1746.
[8]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243--254.
[9]
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[11]
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760 (2017).
[12]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, and Andreas Moshovos. 2015. Reduced-precision strategies for bounded memory in deep neural nets. arXiv preprint arXiv:1511.05236 (2015).
[13]
Janis Keuper and Franz-Josef Pfreundt. 2016. Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability. In Proceedings of the Workshop on Machine Learning in High Performance Computing Environments (MLHPC '16). IEEE Press, Piscataway, NJ, USA, 19--26. https://doi.org/10.1109/MLHPC.2016.6
[14]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[15]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[16]
Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016).
[17]
Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, and Tom Goldstein. 2017. Training Quantized Nets: A Deeper Understanding. arXiv preprint arXiv:1706.02379 (2017).
[18]
Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. 2016. Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning. 2849--2858.
[19]
Daisuke Miyashita, Edward H Lee, and Boris Murmann. 2016. Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025 (2016).
[20]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525--542.
[21]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.
[22]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.
[23]
Website. 2017. ImageNet. (2017). http://www.image-net.org/challenges/LSVRC/2012/
[24]
Website. 2017. SLIM. (2017). https://github.com/tensorflow/models/tree/master/slim
[25]
Website. 2017. TensorFlow Models. (2017). https://github.com/tensorflow/models
[26]
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044 (2017).
[27]
Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. 2016. Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016).

Cited By

View all
  • (2023)More AddNet: A deeper insight into DNNs using FPGA-optimized multipliers2023 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS46773.2023.10181827(1-5)Online publication date: 21-May-2023
  • (2023)A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01558(16943-16952)Online publication date: 1-Oct-2023
  • (2022)FASE: A Fast, Accurate and Seamless Emulator for Custom Numerical FormatsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26419-1_29(480-497)Online publication date: 19-Sep-2022
  • Show More Cited By
  1. TensorQuant: A Simulation Toolbox for Deep Neural Network Quantization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MLHPC'17: Proceedings of the Machine Learning on HPC Environments
    November 2017
    81 pages
    ISBN:9781450351379
    DOI:10.1145/3146347
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep Learning
    2. FPGA
    3. Parallelization
    4. Quantization

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SC '17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 5 of 7 submissions, 71%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 21 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)More AddNet: A deeper insight into DNNs using FPGA-optimized multipliers2023 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS46773.2023.10181827(1-5)Online publication date: 21-May-2023
    • (2023)A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01558(16943-16952)Online publication date: 1-Oct-2023
    • (2022)FASE: A Fast, Accurate and Seamless Emulator for Custom Numerical FormatsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26419-1_29(480-497)Online publication date: 19-Sep-2022
    • (2022)Approximate Computing for Scientific ApplicationsApproximate Computing Techniques10.1007/978-3-030-94705-7_14(415-465)Online publication date: 3-Jan-2022
    • (2021)LC: A Flexible, Extensible Open-Source Toolkit for Model CompressionProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482005(4504-4514)Online publication date: 26-Oct-2021
    • (2021)Benchmarking of Quantization Libraries in Popular Frameworks2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671500(3050-3055)Online publication date: 15-Dec-2021
    • (2020)L2L: A Highly Accurate Log_2_Lead Quantization of Pre-trained Neural Networks2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116373(979-982)Online publication date: Mar-2020
    • (2020)MEM-OPT: A Scheduling and Data Re-Use System to Optimize On-Chip Memory Usage for CNNs On-Board FPGAsIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2020.301529410:3(335-347)Online publication date: Sep-2020
    • (2019)Constrained deep neural network architecture search for IoT devices accounting for hardware calibrationProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454831(6056-6066)Online publication date: 8-Dec-2019
    • (2019)Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuantECML PKDD 2018 Workshops10.1007/978-3-030-14880-5_1(5-20)Online publication date: 8-Mar-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media