Abstract
Sparse training is one of the promising techniques to reduce the computational cost of deep neural networks (DNNs) while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only {N} out of consecutive {M} elements can be nonzero, has attracted attention due to its hardware-friendly pattern and capability of achieving a high sparse ratio. However, the potential to accelerate N:M sparse DNN training has not been fully exploited, and there is a lack of efficient hardware supporting N:M sparse training. To tackle these challenges, this article presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design. At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy. At the architecture level, a sparse accelerator for DNN training, namely, SAT, is developed to neatly support both the regular dense operations and the computation-efficient N:M sparse operations. At the dataflow level, multiple optimization methods ranging from interleave mapping, pregeneration of N:M sparse weights, and offline scheduling, are proposed to boost the computational efficiency of SAT. Finally, the effectiveness of our training scheme is evaluated on a Xilinx VCU1525 FPGA card using various DNN models (ResNet9, ViT, VGG19, ResNet18, and ResNet50) and datasets (CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet). Experimental results show the SAT accelerator with the BDWP sparse training method under 2:8 sparse ratio achieves an average speedup of 1.75times over that with the dense training, accompanied by a negligible accuracy loss of 0.56% on average. Furthermore, our proposed training scheme significantly improves the training throughput by 2.97times - 25.22times and the energy efficiency by 1.36times - 3.58times over prior FPGA-based accelerators.
Original language | English |
---|---|
Article number | 10256041 |
Pages (from-to) | 506-519 |
Number of pages | 14 |
Journal | IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems |
Volume | 43 |
Issue number | 2 |
DOIs | |
Publication status | Published - Feb 2024 |
Keywords
- Algorithm-hardware codesign
- deep neural networks (DNNs)
- DNN training
- neural network compression
- pruning
- sparse training