Get our free extension to see links to code for papers anywhere online!

 Add to Chrome

 Add to Firefox

CatalyzeX Code Finder - Browser extension linking code for ML papers across the web! | Product Hunt Embed

Small nonlinearities in activation functions create bad local minima in neural networks

Sep 28, 2018
Chulhee Yun, Suvrit Sra, Ali Jadbabaie



We investigate the loss surface of neural networks. We prove that even for one-hidden-layer networks with "slightest" nonlinearity, the empirical risks have spurious local minima in most cases. Our results thus indicate that in general "no spurious local minima" is a property limited to deep linear networks, and insights obtained from linear networks are not robust. Specifically, for ReLU(-like) networks we constructively prove that for almost all (in contrast to previous results) practical datasets there exist infinitely many local minima. We also present a counterexample for more general activations (sigmoid, tanh, arctan, ReLU, etc.), for which there exists a bad local minimum. Our results make the least restrictive assumptions relative to existing results on local optimality in neural networks. We complete our discussion by presenting a comprehensive characterization of global optimality for deep linear networks, which unifies other results on this topic.

* 36 pages 


Share this with someone who'll enjoy it:

   Access Paper Source



Share this with someone who'll enjoy it: