Sense: Model-hardware codesign for accelerating sparse CNNs on systolic arrays

W Sun, D Liu, Z Zou, W Sun, S Chen…�- IEEE Transactions on�…, 2023 - ieeexplore.ieee.org
W Sun, D Liu, Z Zou, W Sun, S Chen, Y Kang
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2023ieeexplore.ieee.org
Sparsity is an intrinsic property of convolutional neural networks (CNNs), worth exploiting for
CNN accelerators. However, the extra processing involved comes with hardware overhead,
resulting in only marginal profits for most architectures. Meanwhile, systolic arrays have
become increasingly competitive on CNN acceleration for its high spatiotemporal locality
and low hardware overhead. However, the irregularity of sparsity induces imbalanced
workloads under the rigid systolic dataflow, causing performance degradation. Thus, this�…
Sparsity is an intrinsic property of convolutional neural networks (CNNs), worth exploiting for CNN accelerators. However, the extra processing involved comes with hardware overhead, resulting in only marginal profits for most architectures. Meanwhile, systolic arrays have become increasingly competitive on CNN acceleration for its high spatiotemporal locality and low hardware overhead. However, the irregularity of sparsity induces imbalanced workloads under the rigid systolic dataflow, causing performance degradation. Thus, this article proposed a systolic-array-based architecture, called Sense, for sparse CNN acceleration by model-hardware codesign, enabling large performance gains. To balance input feature map (IFM) and weight loads across the processing element (PE) array, we applied channel clustering to gather IFMs with approximate sparsity for array computation and codesigned a load-balancing weight pruning method to keep the sparsity ratio of each kernel at a certain value with little accuracy loss, improving PE utilization and overall performance. In addition, adaptive dataflow configuration was applied to determine the computing strategy based on the storage ratio of IFMs and weights, lowering dynamic random access memory (DRAM) access compared with Swallow and further reducing system energy consumption. The whole design was implemented on ZynqZCU102 with 200 MHz and performs at 471, 34, 53, and 191 image/s for AlexNet, VGG-16, ResNet-50, and GoogleNet, respectively. Compared with sparse systolic-array-based accelerators, Swallow, fusion-enabled systolic architecture (FESA), and SPOTS, Sense achieves , , and energy efficiency (image/J) on these CNNs, respectively.
ieeexplore.ieee.org
Showing the best result for this search. See all results