skip to main content
research-article

Communication Minimized Model-Architecture Co-design for Efficient Convolution Acceleration

Published: 12 June 2024 Publication History

Abstract

CNN is indispensable for today’s Artificial Intelligence (AI) applications, but brings dominantly large overhead of data communication. Current works mainly focus on prior off-chip or intuitive/heuristic on-chip access optimization, but with the development of Near Memory Processing (NMP), DRAM access cost has greatly dropped and on&off-chip access optimization needs rethinking as a whole. Thus, this paper proposes a holistic on&off-chip communication-minimized model-architecture acceleration scheme for CNN. First, we derive the layer-wise off-chip communication Lower Bound (LB) based on different data reuse strategies. Second, on-chip LB is derived and overall on&off-chip communication analysis model is presented to provide a solid guidance for on-chip storage allocation, dataflow and architecture design. Finally, we design Window-Primitive (WP) dataflow and a Systolic-Cross-Line (SCL) CNN accelerator based on proposed theoretical model. SCL achieves 3.8 × pJ/MAC energy reduction at 1.4 × less on-chip storage area compared with Eyeriss and 1.3~1.8 × reduction at 3~4 × less area compared with CLB. For NMP, we reduce around 2 × access energy compared with previous systolic NMP architecture.

References

[1]
Tianshi Chen, Zidong Du, 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning(ASPLOS ’14). Association for Computing Machinery, 269–284.
[2]
Xiaoming Chen, Yinhe Han, and Yu Wang. 2020. Communication lower bound in convolution accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 529–541.
[3]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH computer architecture news 44, 3 (2016), 367–379.
[4]
Yu-Hsin Chen, Tushar Krishna, 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits 52, 1 (2016), 127–138.
[5]
Bai Fujun 2020. A Stacked Embedded DRAM Array for LPDDR4/4X using Hybrid Bonding 3D Integration with 34GB/s/1Gb 0.88pJ/b Logic-to-Memory Interface. In 2020 IEEE International Electron Devices Meeting (IEDM).
[6]
Norman P Jouppi, Cliff Young, 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1–12.
[7]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.). Vol. 25. Curran Associates, Inc.
[8]
Yann Lecun 2010. Convolutional Networks and Applications in Vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems.
[9]
Dimin Niu, Shuangchen Li, 2022. 184QPS/W 64Mb/mm23D Logic-to-DRAM Hybrid Bonding with Process-Near-Memory Engine for Recommendation System. In 2022 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 65. 1–3.
[10]
Prasanth Prabu Ravichandiran and Paul D Franzon. 2021. A review of 3D-dynamic random-access memory based near-memory computation. In 2021 IEEE International 3D Systems Integration Conference (3DIC). IEEE, 1–6.
[11]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).
[12]
Haiping Wu 2021. CvT: Introducing Convolutions to Vision Transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 22–31.
[13]
Chen Xin 2017. COSY: An Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array. In 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS). 180–189.
[14]
Shijin Zhang, Zidong Du, 2016. Cambricon-X: An accelerator for sparse neural networks. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '24: Proceedings of the Great Lakes Symposium on VLSI 2024
June 2024
797 pages
ISBN:9798400706059
DOI:10.1145/3649476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CNN accelerator
  2. communication analysis model
  3. convolutional neural network (CNN)
  4. dataflow

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • University Synergy Innovation Program of Anhui Province
  • Strategic Priority Research Program of Chinese Academy of Sciences
  • CAS Project for Young Scientists in Basic Research

Conference

GLSVLSI '24
Sponsor:
GLSVLSI '24: Great Lakes Symposium on VLSI 2024
June 12 - 14, 2024
FL, Clearwater, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 39
    Total Downloads
  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)5
Reflects downloads up to 19 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media