Almost sure convergence of dropout algorithms for neural networks

Research output: Contribution to journalArticleAcademic

2 Downloads (Pure)

Abstract

We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that, over the years, have spawned from Dropout (Hinton et al., 2012). Modeling that neurons in the brain may not fire, dropout algorithms consist in practice of multiplying the weight matrices of a NN component-wise by independently drawn random matrices with $\{0,1\}$-valued entries during each iteration of the Feedforward-Backpropagation algorithm. This paper presents a probability theoretical proof that for any NN topology and differentiable polynomially bounded activation functions, if we project the NN's weights into a compact set and use a dropout algorithm, then the weights converge to a unique stationary set of a projected system of Ordinary Differential Equations (ODEs). We also establish an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for arborescences (a class of trees) of arbitrary depth and with linear activation functions.
Original languageEnglish
Article number2002.02247v1
Number of pages20
JournalarXiv
Publication statusPublished - 6 Feb 2020

Fingerprint

Almost Sure Convergence
Drop out
Neural Networks
Activation Function
Rate of Convergence
Stationary Set
Training Algorithm
Back-propagation Algorithm
Gradient Descent
Stochastic Algorithms
Feedforward
Random Matrices
System of Ordinary Differential Equations
Compact Set
Network Topology
Linear Function
Differentiable
Neuron
Ordinary differential equation
Limiting

Bibliographical note

20 pages, 2 figures

Keywords

  • math.OC
  • cs.LG
  • math.PR

Cite this

@article{65533292ab604c7aa50874681c029341,
title = "Almost sure convergence of dropout algorithms for neural networks",
abstract = "We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that, over the years, have spawned from Dropout (Hinton et al., 2012). Modeling that neurons in the brain may not fire, dropout algorithms consist in practice of multiplying the weight matrices of a NN component-wise by independently drawn random matrices with $\{0,1\}$-valued entries during each iteration of the Feedforward-Backpropagation algorithm. This paper presents a probability theoretical proof that for any NN topology and differentiable polynomially bounded activation functions, if we project the NN's weights into a compact set and use a dropout algorithm, then the weights converge to a unique stationary set of a projected system of Ordinary Differential Equations (ODEs). We also establish an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for arborescences (a class of trees) of arbitrary depth and with linear activation functions.",
keywords = "math.OC, cs.LG, math.PR",
author = "Albert Senen-Cerda and Jaron Sanders",
note = "20 pages, 2 figures",
year = "2020",
month = "2",
day = "6",
language = "English",
journal = "arXiv",
publisher = "Cornell University Library",

}

Almost sure convergence of dropout algorithms for neural networks. / Senen-Cerda, Albert; Sanders, Jaron.

In: arXiv, 06.02.2020.

Research output: Contribution to journalArticleAcademic

TY - JOUR

T1 - Almost sure convergence of dropout algorithms for neural networks

AU - Senen-Cerda, Albert

AU - Sanders, Jaron

N1 - 20 pages, 2 figures

PY - 2020/2/6

Y1 - 2020/2/6

N2 - We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that, over the years, have spawned from Dropout (Hinton et al., 2012). Modeling that neurons in the brain may not fire, dropout algorithms consist in practice of multiplying the weight matrices of a NN component-wise by independently drawn random matrices with $\{0,1\}$-valued entries during each iteration of the Feedforward-Backpropagation algorithm. This paper presents a probability theoretical proof that for any NN topology and differentiable polynomially bounded activation functions, if we project the NN's weights into a compact set and use a dropout algorithm, then the weights converge to a unique stationary set of a projected system of Ordinary Differential Equations (ODEs). We also establish an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for arborescences (a class of trees) of arbitrary depth and with linear activation functions.

AB - We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that, over the years, have spawned from Dropout (Hinton et al., 2012). Modeling that neurons in the brain may not fire, dropout algorithms consist in practice of multiplying the weight matrices of a NN component-wise by independently drawn random matrices with $\{0,1\}$-valued entries during each iteration of the Feedforward-Backpropagation algorithm. This paper presents a probability theoretical proof that for any NN topology and differentiable polynomially bounded activation functions, if we project the NN's weights into a compact set and use a dropout algorithm, then the weights converge to a unique stationary set of a projected system of Ordinary Differential Equations (ODEs). We also establish an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for arborescences (a class of trees) of arbitrary depth and with linear activation functions.

KW - math.OC

KW - cs.LG

KW - math.PR

M3 - Article

JO - arXiv

JF - arXiv

M1 - 2002.02247v1

ER -