Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanhui Su

Dying ReLU and Initialization: Theory and Numerical Examples

Mar 15, 2019

Lu Lu, Yeonjong Shin, Yanhui Su, George Em Karniadakis

Figure 1 for Dying ReLU and Initialization: Theory and Numerical Examples

Figure 2 for Dying ReLU and Initialization: Theory and Numerical Examples

Figure 3 for Dying ReLU and Initialization: Theory and Numerical Examples

Figure 4 for Dying ReLU and Initialization: Theory and Numerical Examples

Abstract:The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations on why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We prove that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.

Via

Access Paper or Ask Questions

Collapse of Deep and Narrow Neural Nets

Aug 15, 2018

Lu Lu, Yanhui Su, George Em Karniadakis

Figure 1 for Collapse of Deep and Narrow Neural Nets

Figure 2 for Collapse of Deep and Narrow Neural Nets

Figure 3 for Collapse of Deep and Narrow Neural Nets

Figure 4 for Collapse of Deep and Narrow Neural Nets

Abstract:Recent theoretical work has demonstrated that deep neural networks have superior performance over shallow networks, but their training is more difficult, e.g., they suffer from the vanishing gradient problem. This problem can be typically resolved by the rectified linear unit (ReLU) activation. However, here we show that even for such activation, deep and narrow neural networks will converge to erroneous mean or median states of the target function depending on the loss with high probability. We demonstrate this collapse of deep and narrow neural networks both numerically and theoretically, and provide estimates of the probability of collapse. We also construct a diagram of a safe region of designing neural networks that avoid the collapse to erroneous states. Finally, we examine different ways of initialization and normalization that may avoid the collapse problem.

Via

Access Paper or Ask Questions