Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Iterate Averaging Helps: An Alternative Perspective in Deep Learning

Mar 02, 2020

Diego Granziol, Xingchen Wan, Stephen Roberts

Figure 1 for Iterate Averaging Helps: An Alternative Perspective in Deep Learning

Figure 2 for Iterate Averaging Helps: An Alternative Perspective in Deep Learning

Figure 3 for Iterate Averaging Helps: An Alternative Perspective in Deep Learning

Figure 4 for Iterate Averaging Helps: An Alternative Perspective in Deep Learning

Share this with someone who'll enjoy it:

Abstract:Iterate averaging has a rich history in optimisation, but has only very recently been popularised in deep learning. We investigate its effects in a deep learning context, and argue that previous explanations on its efficacy, which place a high importance on the local geometry (flatness vs sharpness) of final solutions, are not necessarily relevant. We instead argue that the robustness of iterate averaging towards the typically very high estimation noise in deep learning and the various regularisation effects averaging exert, are the key reasons for the performance gain, indeed this effect is made even more prominent due to the over-parameterisation of modern networks. Inspired by this, we propose Gadam, which combines Adam with iterate averaging to address one of key problems of adaptive optimisers that they often generalise worse. Without compromising adaptivity and with minimal additional computational burden, we show that Gadam (and its variant GadamX) achieve a generalisation performance that is consistently superior to tuned SGD and is even on par or better compared to SGD with iterate averaging on various image classification (CIFAR 10/100 and ImageNet 32$\times$32) and language tasks (PTB).

* 9 pages, 8 figures, 21 pages including references and appendix

View paper on

Share this with someone who'll enjoy it:

Title:Iterate Averaging Helps: An Alternative Perspective in Deep Learning

Paper and Code