Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

In this paper, we develop a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training. By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization in the space-time manifold, closing the gap in the expressibility between sparse training and dense training. We further use ITOP to understand the underlying mechanism of Dynamic Sparse Training (DST) and indicate that the benefits of DST come from its ability to consider across time all possible parameters when searching for the optimal sparse connectivity. As long as there are sufficient parameters that have been reliably explored during training, DST can outperform the dense neural network by a large margin. We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with ResNet-50 on ImageNet. More impressively, our method achieves dominant performance over the overparameterization-based sparse methods at extreme sparsity levels. When trained on CIFAR-100, our method can match the performance of the dense model even at an extreme sparsity (98%).
Original languageEnglish
Title of host publicationProceedings of the 38th International Conference on Machine Learning (ICML2021)
EditorsMarina Meila, Tong Zhang
Pages6989-7000
Publication statusPublished - 18 Jul 2021

Publication series

NameProceedings of Machine Learning Research
Volume139

Keywords

  • sparse training
  • overparameterization
  • In-Time Over-Parameterization
  • dynamic sparse training

Fingerprint

Dive into the research topics of 'Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training'. Together they form a unique fingerprint.
  • Selfish Sparse RNN Training

    Liu, S., Mocanu, D. C., Pei, Y. & Pechenizkiy, M., 18 Jul 2021, Proceedings of the 38th International Conference on Machine Learning (ICML2021) . PMLR, Vol. 139. p. 6893--6904 139

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Open Access
  • Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

    Liu, S., Chen, T., Chen, X., Atashgahi, Z., Yin, L., Kou, H., Shen, L., Pechenizkiy, M., Wang, Z. & Mocanu, D. C., 2021, Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021. Ranzato, MA., Beygelzimer, A., Dauphin, Y., Liang, P. S. & Wortman Vaughan, J. (eds.). Neural information processing systems foundation, p. 9908-9922 15 p. (Advances in Neural Information Processing Systems; vol. 12).

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Open Access

Cite this