Google Scholar

Boosting deep neural network efficiency with dual-module inference

L Liu, L Deng, Z Chen, Y Wang, S Li…�- International�…, 2020 - proceedings.mlr.press

L Liu, L Deng, Z Chen, Y Wang, S Li, J Zhang, Y Yang, Z Gu, Y Ding, Y Xie

International Conference on Machine Learning, 2020•proceedings.mlr.press

Abstract

Using deep neural networks (DNNs) in machine learning tasks is promising in delivering high-quality results but challenging to meet stringent latency requirements and energy constraints because of the memory-bound and the compute-bound execution pattern of DNNs. We propose a big-little dual-module inference to dynamically skip unnecessary memory accesses and computations to accelerate DNN inference. Leveraging the noise-resilient feature of nonlinear activation functions, we propose to use a lightweight little module that approximates the original DNN layer, termed as the big module, to compute activations of the insensitive region that are more noise-resilient. Hence, the expensive memory accesses and computations of the big module can be reduced as the results are only calculated in the sensitive region. For memory-bound models such as recurrent neural networks (RNNs), our method can reduce the overall memory accesses by 40% on average and achieve 1.54 x to 1.75 x speedup on a commodity CPU-based server platform with a negligible impact on model quality. In addition, our method can reduce the operations of the compute-bound models such as convolutional neural networks (CNNs) by 3.02 x, with only a 0.5% accuracy drop.

proceedings.mlr.press

Show moreShow less

Save Cite Cited by 13 Related articles All 7 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Boosting deep neural network efficiency with dual-module inference