Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

67 Citaten (Scopus)

Samenvatting

In this paper, we introduce a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training. By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization over the course of training, closing the gap in the expressibility between sparse training and dense training. We further use ITOP to understand the underlying mechanism of Dynamic Sparse Training (DST) and discover that the benefits of DST come from its ability to consider across time all possible parameters when searching for the optimal sparse connectivity. As long as sufficient parameters have been reliably explored, DST can outperform the dense neural network by a large margin. We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with ResNet-50 on ImageNet. More impressively, ITOP achieves dominant performance over the overparameterization-based sparse methods at extreme sparsities. When trained with ResNet-34 on CIFAR-100, ITOP can match the performance of the dense model at an extreme sparsity of 98%.

Originele taal-2Engels
TitelProceedings of the 38th International Conference on Machine Learning (ICML2021)
RedacteurenMarina Meila, Tong Zhang
Pagina's6989-7000
Aantal pagina's12
ISBN van elektronische versie9781713845065
StatusGepubliceerd - 18 jul. 2021

Publicatie series

NaamProceedings of Machine Learning Research
Volume139
ISSN van elektronische versie2640-3498

Vingerafdruk

Duik in de onderzoeksthema's van 'Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training'. Samen vormen ze een unieke vingerafdruk.

Citeer dit