TY - GEN
T1 - ESCEPE: Early-Exit Network Section-Wise Model Compression Using Self-distillation and Weight Clustering
AU - Khalilian Gourtani, Saeed
AU - Meratnia, Nirvana
PY - 2023/5/8
Y1 - 2023/5/8
N2 - Deploying deep learning models on resource-constrained (edge) devices is challenging due to their high computational demands and large model sizes. Early-exit neural networks are one of the approaches to make deep learning models more efficient for resource-constrained devices by reducing computational cost and latency. However, even with early-exit neural networks, the model size may remain a problem when deploying them on edge devices. To address this problem, we propose a section-wise model compression technique for compressing an early-exit neural network with intermediate classifiers. Our approach divides the model into a few sections and uses different compression settings in the weight clustering-based compression for each section to prevent accuracy loss in the intermediate sections. We demonstrate that knowledge distillation can be used in the retraining phase to transfer knowledge from uncompressed to compressed sections and to accelerate the recovery of performance reduction after the weight clustering stages. The performance evaluation of our proposed method on CIFAR10 and CIFAR100 datasets using ResNet and WideResNet architectures demonstrates that the proposed technique can compress an early-exit neural network with a high compression ratio with minimal impact on the accuracy of intermediate classifiers. The proposed method achieves compression ratios of more than 36 and 22 times for ResNet18 with three shallow classifiers on CIFAR10 and CIFAR100, respectively, with an ensemble accuracy loss of less than 1%. By eliminating shallow classifiers from the early-exit model, the static model can achieve compression ratios of up to 64 and 52 times for ResNet18 and WideResNet50, respectively, on the CIFAR10 dataset with an accuracy loss of less than 2.5%.
AB - Deploying deep learning models on resource-constrained (edge) devices is challenging due to their high computational demands and large model sizes. Early-exit neural networks are one of the approaches to make deep learning models more efficient for resource-constrained devices by reducing computational cost and latency. However, even with early-exit neural networks, the model size may remain a problem when deploying them on edge devices. To address this problem, we propose a section-wise model compression technique for compressing an early-exit neural network with intermediate classifiers. Our approach divides the model into a few sections and uses different compression settings in the weight clustering-based compression for each section to prevent accuracy loss in the intermediate sections. We demonstrate that knowledge distillation can be used in the retraining phase to transfer knowledge from uncompressed to compressed sections and to accelerate the recovery of performance reduction after the weight clustering stages. The performance evaluation of our proposed method on CIFAR10 and CIFAR100 datasets using ResNet and WideResNet architectures demonstrates that the proposed technique can compress an early-exit neural network with a high compression ratio with minimal impact on the accuracy of intermediate classifiers. The proposed method achieves compression ratios of more than 36 and 22 times for ResNet18 with three shallow classifiers on CIFAR10 and CIFAR100, respectively, with an ensemble accuracy loss of less than 1%. By eliminating shallow classifiers from the early-exit model, the static model can achieve compression ratios of up to 64 and 52 times for ResNet18 and WideResNet50, respectively, on the CIFAR10 dataset with an accuracy loss of less than 2.5%.
KW - Early-exit neural networks
KW - model compression
KW - weight clustering
KW - self-distillation
KW - early-exit neural networks
UR - http://www.scopus.com/inward/record.url?scp=85159308248&partnerID=8YFLogxK
U2 - 10.1145/3578354.3592872
DO - 10.1145/3578354.3592872
M3 - Conference contribution
SP - 48
EP - 53
BT - EdgeSys '23: Proceedings of the 6th International Workshop on Edge Systems, Analytics and Networking
PB - Association for Computing Machinery, Inc
ER -