research-article

TensorQuant: A Simulation Toolbox for Deep Neural Network Quantization

Authors:

Dominik Marek Loroch,

Franz-Josef Pfreundt,

Janis KeuperAuthors Info & Claims

MLHPC'17: Proceedings of the Machine Learning on HPC Environments

Article No.: 1, Pages 1 - 8

https://doi.org/10.1145/3146347.3146348

Published: 12 November 2017 Publication History

Abstract

Recent research implies that training and inference of deep neural networks (DNN) can be computed with low precision numerical representations of the training/test data, weights and gradients without a general loss in accuracy. The benefit of such compact representations is twofold: they allow a significant reduction of the communication bottleneck in distributed DNN training and faster neural network implementations on hardware accelerators like FPGAs. Several quantization methods have been proposed to map the original 32-bit floating point problem to low-bit representations. While most related publications validate the proposed approach on a single DNN topology, it appears to be evident, that the optimal choice of the quantization method and number of coding bits is topology dependent. To this end, there is no general theory available, which would allow users to derive the optimal quantization during the design of a DNN topology.

In this paper, we present a quantization tool box for the TensorFlow framework. TensorQuant allows a transparent quantization simulation of existing DNN topologies during training and inference. TensorQuant supports generic quantization methods and allows experimental evaluation of the impact of the quantization on single layers as well as on the full topology. In a first series of experiments with TensorQuant, we show an analysis of fix-point quantizations of popular CNN topologies.

References

[1]

Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In International Conference on Machine Learning. 2285--2294.

Digital Library

[2]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.

[3]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016).

[4]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.

[5]

Vinayak Gokhale, Aliasger Zaidy, Andre Xian Ming Chang, and Eugenio Culurciello. 2017. Snowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks. arXiv preprint arXiv:1708.02579 (2017).

[6]

Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014).

[7]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1737--1746.

Digital Library

[8]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 243--254.

Digital Library

[9]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[11]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760 (2017).

[12]

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, and Andreas Moshovos. 2015. Reduced-precision strategies for bounded memory in deep neural nets. arXiv preprint arXiv:1511.05236 (2015).

[13]

Janis Keuper and Franz-Josef Pfreundt. 2016. Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability. In Proceedings of the Workshop on Machine Learning in High Performance Computing Environments (MLHPC '16). IEEE Press, Piscataway, NJ, USA, 19--26. https://doi.org/10.1109/MLHPC.2016.6

[14]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[15]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[16]

Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016).

[17]

Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, and Tom Goldstein. 2017. Training Quantized Nets: A Deeper Understanding. arXiv preprint arXiv:1706.02379 (2017).

[18]

Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. 2016. Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning. 2849--2858.

Digital Library

[19]

Daisuke Miyashita, Edward H Lee, and Boris Murmann. 2016. Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025 (2016).

[20]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525--542.

[21]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.

[22]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.

[23]

Website. 2017. ImageNet. (2017). http://www.image-net.org/challenges/LSVRC/2012/

[24]

Website. 2017. SLIM. (2017). https://github.com/tensorflow/models/tree/master/slim

[25]

Website. 2017. TensorFlow Models. (2017). https://github.com/tensorflow/models

[26]

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044 (2017).

[27]

Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. 2016. Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016).

Cited By

Hardieck MHabermann TWagner FMecik MKumm MZipf P(2023)More AddNet: A deeper insight into DNNs using FPGA-optimized multipliers2023 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS46773.2023.10181827(1-5)Online publication date: 21-May-2023
https://doi.org/10.1109/ISCAS46773.2023.10181827
Colbert IPappalardo APetri-Koenig J(2023)A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01558(16943-16952)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.01558
Osorio JArmejach APetit EHenry GCasas M(2022)FASE: A Fast, Accurate and Seamless Emulator for Custom Numerical FormatsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26419-1_29(480-497)Online publication date: 19-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-26419-1_29
Show More Cited By

TensorQuant: A Simulation Toolbox for Deep Neural Network Quantization
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

On Hardware Implementations Of DCT and Quantization Blocks for H.264/AVC
Abstract
H.264/AVC also known as MPEG 4 part 10 or JVT, is a recently established video coding standard by the Joint Video Team (JVT) of the ISO/IEC MPEG and ITU-T VCEG. The main goal of the paper is to give a broader understanding of the design ...
On Hardware Implementations Of DCT and Quantization Blocks for H.264/AVC
Abstract
H.264/AVC also known as MPEG 4 part 10 or JVT, is a recently established video coding standard by the Joint Video Team (JVT) of the ISO/IEC MPEG and ITU-T VCEG. The main goal of the paper is to give a broader understanding of the design ...
A survey of FPGA-based accelerators for convolutional neural networks
Abstract
Deep convolutional neural networks (CNNs) have recently shown very high accuracy in a wide range of cognitive tasks, and due to this, they have received significant interest from the researchers. Given the high computational demands of CNNs, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MLHPC'17: Proceedings of the Machine Learning on HPC Environments

November 2017

81 pages

ISBN:9781450351379

DOI:10.1145/3146347

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SC '17

Sponsor:

SIGHPC

SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 2017

CO, Denver, USA

Acceptance Rates

Overall Acceptance Rate 5 of 7 submissions, 71%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
347
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)5

Reflects downloads up to 21 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hardieck MHabermann TWagner FMecik MKumm MZipf P(2023)More AddNet: A deeper insight into DNNs using FPGA-optimized multipliers2023 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS46773.2023.10181827(1-5)Online publication date: 21-May-2023
https://doi.org/10.1109/ISCAS46773.2023.10181827
Colbert IPappalardo APetri-Koenig J(2023)A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01558(16943-16952)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.01558
Osorio JArmejach APetit EHenry GCasas M(2022)FASE: A Fast, Accurate and Seamless Emulator for Custom Numerical FormatsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26419-1_29(480-497)Online publication date: 19-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-26419-1_29
Anzt HCasas MMalossi AQuintana-Ortí EScheidegger FZhuang S(2022)Approximate Computing for Scientific ApplicationsApproximate Computing Techniques10.1007/978-3-030-94705-7_14(415-465)Online publication date: 3-Jan-2022
https://doi.org/10.1007/978-3-030-94705-7_14
Idelbayev YCarreira-Perpiñán MDemartini GZuccon GCulpepper JHuang ZTong H(2021)LC: A Flexible, Extensible Open-Source Toolkit for Model CompressionProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482005(4504-4514)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482005
Dubhir TMishra MSinghal R(2021)Benchmarking of Quantization Libraries in Popular Frameworks2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671500(3050-3055)Online publication date: 15-Dec-2021
https://doi.org/10.1109/BigData52589.2021.9671500
Ullah SGupta SAhuja KTiwari AKumar A(2020)L2L: A Highly Accurate Log_2_Lead Quantization of Pre-trained Neural Networks2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116373(979-982)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116373
Dinelli GMeoni GRapuano EPacini TFanucci L(2020)MEM-OPT: A Scheduling and Data Re-Use System to Optimize On-Chip Memory Usage for CNNs On-Board FPGAsIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2020.301529410:3(335-347)Online publication date: Sep-2020
https://doi.org/10.1109/JETCAS.2020.3015294
Scheidegger FBenini LBekas CMalossi CWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Constrained deep neural network architecture search for IoT devices accounting for hardware calibrationProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454831(6056-6066)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454831
Loroch DPfreundt FWehn NKeuper J(2019)Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuantECML PKDD 2018 Workshops10.1007/978-3-030-14880-5_1(5-20)Online publication date: 8-Mar-2019
https://doi.org/10.1007/978-3-030-14880-5_1
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents