Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Oct 08, 2020

Yikai Wu, Xingyu Zhu, Chenwei Wu, Annie Wang, Rong Ge

Figure 1 for Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Figure 2 for Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Figure 3 for Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Figure 4 for Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Share this with someone who'll enjoy it:

Abstract:Hessian captures important properties of the deep neural network loss landscape. We observe that eigenvectors and eigenspaces of the layer-wise Hessian for neural network objective have several interesting structures -- top eigenspaces for different models have high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the corresponding weight matrix. These structures, as well as the low rank structure of the Hessian observed in previous studies, can be explained by approximating the Hessian using Kronecker factorization. Our new understanding can also explain why some of these structures become weaker when the network is trained with batch normalization. Finally, we show that the Kronecker factorization can be combined with PAC-Bayes techniques to get better explicit generalization bounds.

* 29 pages, 26 figures. Main text: 8 pages, 6 figures. First two authors have equal contribution and are in alphabetical order

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Paper and Code