Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

A Simple Deep Equilibrium Model Converges to Global Optima with Weight Tying

Feb 15, 2021
Kenji Kawaguchi

Share this with someone who'll enjoy it:

A deep equilibrium linear model is implicitly defined through an equilibrium point of an infinite sequence of computation. It avoids any explicit computation of the infinite sequence by finding an equilibrium point directly via root-finding and by computing gradients via implicit differentiation. It is a simple deep equilibrium model with nonlinear activations on weight matrices. In this paper, we analyze the gradient dynamics of this simple deep equilibrium model with non-convex objective functions for a general class of losses used in regression and classification. Despite non-convexity, convergence to global optimum at a linear rate is guaranteed without any assumption on the width of the models, allowing the width to be smaller than the output dimension and the number of data points. Moreover, we prove a relation between the gradient dynamics of the simple deep equilibrium model and the dynamics of trust region Newton method of a shallow model. This mathematically proven relation along with our numerical observation suggests the importance of understanding implicit bias and a possible open problem on the topic. Our proofs deal with nonlinearity and weight tying, and differ from those in the related literature.

* ICLR 2021. Selected for ICLR Spotlight (top 6% submissions) 

   Access Paper Source

Share this with someone who'll enjoy it: