TY - JOUR
T1 - Efficient and effective training of sparse recurrent neural networks
AU - Liu, Shiwei
AU - Ni'Mah, Iftitahu
AU - Menkovski, V.
AU - Mocanu, Decebal C.
AU - Pechenizkiy, Mykola
PY - 2021/8
Y1 - 2021/8
N2 - Recurrent neural networks (RNNs) have achieved state-of-the-art performances on various applications. However, RNNs are prone to be memory-bandwidth limited in practical applications and need both long periods of training and inference time. The aforementioned problems are at odds with training and deploying RNNs on resource-limited devices where the memory and floating-point operations (FLOPs) budget are strictly constrained. To address this problem, conventional model compression techniques usually focus on reducing inference costs, operating on a costly pre-trained model. Recently, dynamic sparse training has been proposed to accelerate the training process by directly training sparse neural networks from scratch. However, previous sparse training techniques are mainly designed for convolutional neural networks and multi-layer perceptron. In this paper, we introduce a method to train intrinsically sparse RNN models with a fixed number of parameters and floating-point operations (FLOPs) during training. We demonstrate state-of-the-art sparse performance with long short-term memory and recurrent highway networks on widely used tasks, language modeling, and text classification. We simply use the results to advocate that, contrary to the general belief that training a sparse neural network from scratch leads to worse performance than dense networks, sparse training with adaptive connectivity can usually achieve better performance than dense models for RNNs.
AB - Recurrent neural networks (RNNs) have achieved state-of-the-art performances on various applications. However, RNNs are prone to be memory-bandwidth limited in practical applications and need both long periods of training and inference time. The aforementioned problems are at odds with training and deploying RNNs on resource-limited devices where the memory and floating-point operations (FLOPs) budget are strictly constrained. To address this problem, conventional model compression techniques usually focus on reducing inference costs, operating on a costly pre-trained model. Recently, dynamic sparse training has been proposed to accelerate the training process by directly training sparse neural networks from scratch. However, previous sparse training techniques are mainly designed for convolutional neural networks and multi-layer perceptron. In this paper, we introduce a method to train intrinsically sparse RNN models with a fixed number of parameters and floating-point operations (FLOPs) during training. We demonstrate state-of-the-art sparse performance with long short-term memory and recurrent highway networks on widely used tasks, language modeling, and text classification. We simply use the results to advocate that, contrary to the general belief that training a sparse neural network from scratch leads to worse performance than dense networks, sparse training with adaptive connectivity can usually achieve better performance than dense models for RNNs.
KW - Dynamic sparse training
KW - Long short-term memory
KW - Recurrent highway networks
KW - Sparse recurrent neural networks
UR - https://rdcu.be/cegM5
UR - http://www.scopus.com/inward/record.url?scp=85099987053&partnerID=8YFLogxK
U2 - 10.1007/s00521-021-05727-y
DO - 10.1007/s00521-021-05727-y
M3 - Article
SN - 0941-0643
VL - 33
SP - 9625
EP - 9636
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 15
ER -