Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

Jan 11, 2019

Devansh Arpit, Yoshua Bengio

Figure 1 for The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

Figure 2 for The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

Figure 3 for The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

Figure 4 for The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

Share this with someone who'll enjoy it:

Abstract:It has been noted in existing literature that over-parameterization in ReLU networks generally leads to better performance. While there could be several reasons for this, we investigate desirable network properties at initialization which may be enjoyed by ReLU networks. Without making any assumption, we derive a lower bound on the layer width of deep ReLU networks whose weights are initialized from a certain distribution, such that with high probability, i) the norm of hidden activation of all layers are roughly equal to the norm of the input, and, ii) the norm of parameter gradient for all the layers are roughly the same. In this way, sufficiently wide deep ReLU nets with appropriate initialization can inherently preserve the forward flow of information and also avoid the gradient exploding/vanishing problem. We further show that these results hold for an infinite number of data samples, in which case the finite lower bound depends on the input dimensionality and the depth of the network. In the case of deep ReLU networks with weight vectors normalized by their norm, we derive an initialization required to tap the aforementioned benefits from over-parameterization without which network fails to learn for large depth.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

Paper and Code