skip to main content
extended-abstract

FullSparse: A Sparse-Aware GEMM Accelerator with Online Sparsity Prediction

Published: 02 July 2024 Publication History

Abstract

Leveraging sparsity optimizes storage and computation for resource-constrained devices in Deep Learning Neural Networks (DNNs). While neural networks naturally incorporate sparsity through operations like ReLU and quantization, diverse sparsity levels (0.2% to 99%) pose challenges for the design of computational units. In this paper, we provide an energy-efficient GEMM accelerator named FullSparse which is designed for diverse applications, accommodating varying sparsity levels in matrix multiplication (0.2% to 99%). This paper introduces three features for nuanced sparsity support: multi-sparsity control, predictive result sparsity, and a multi-sparsity-compatible PE array. Experimental evaluations affirm that our implementation while ensuring adaptability to sparsity, exhibits superior computational power comparable to the existing designs.

References

[1]
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. ACM SIGARCH Computer Architecture News 44, 3 (2016), 1--13.
[2]
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. ACM SIGARCH Computer Architecture News 44, 3 (2016), 1--13.
[3]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH computer architecture news 44, 3 (2016), 367--379.
[4]
Letian DENG and Yanru ZHAO. 2023. Deep Learning Deep Learning-Based Semantic Based Semantic Feature Extraction Feature Extraction: A Literature Review and A Literature Review and Future Directions. ZTE COMMUNICATIONS 21, 2 (2023), 11--17.
[5]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News 44, 3 (2016), 243--254.
[6]
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[7]
Daiyi LI, Yaofeng TU, Xiangsheng ZHOU, Yangming ZHANG, and Zongmin MA. 2022. End-to-End Chinese Entity Recognition Based on BERT-BiLSTM-ATT-CRF. ZTE Communications 20, S1 (2022), 27--35.
[8]
Bert Moons and Marian Verhelst. 2016. A 0.3-2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets. In 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits). IEEE, 1--2.
[9]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH computer architecture news 45, 2 (2017), 27--40.
[10]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH computer architecture news 45, 2 (2017), 27--40.
[11]
Dongjoo Shin, Jinmook Lee, Jinsu Lee, and Hoi-Jun Yoo. 2017. 14.2 DNPU: An 8.1 TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 240--241.
[12]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12. https://doi.org/10.1109/MICRO.2016.7783723
[13]
Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Jian Tang, Wujie Wen, Makan Fardad, and Yanzhi Wang. 2018. A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European conference on computer vision (ECCV). 184--199.

Index Terms

  1. FullSparse: A Sparse-Aware GEMM Accelerator with Online Sparsity Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers
    May 2024
    345 pages
    ISBN:9798400705977
    DOI:10.1145/3649153
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 July 2024

    Check for updates

    Author Tags

    1. Deep learning Accelerator
    2. Neural network
    3. Sparsity

    Qualifiers

    • Extended-abstract
    • Research
    • Refereed limited

    Conference

    CF '24
    Sponsor:

    Acceptance Rates

    CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;
    Overall Acceptance Rate 273 of 785 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 59
      Total Downloads
    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 23 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media