Abstract:Flat regions of the neural network loss landscape have long been hypothesized to correlate with better generalization properties. A closely related but distinct problem is training models that are robust to internal perturbations to their weights, which may be an important need for future low-power hardware platforms. In this paper, we explore the usage of two methods, sharpness-aware minimization (SAM) and random-weight perturbation (RWP), to find minima robust to a variety of random corruptions to weights. We consider the problem from two angles: generalization (how do we reduce the noise-robust generalization gap) and optimization (how do we maximize performance from optimizers when subject to strong perturbations). First, we establish, both theoretically and empirically, that an over-regularized RWP training objective is optimal for noise-robust generalization. For small-magnitude noise, we find that SAM's adversarial objective further improves performance over any RWP configuration, but performs poorly for large-magnitude noise. We link the cause of this to a vanishing-gradient effect, caused by unevenness in the loss landscape, affecting both SAM and RWP. Lastly, we demonstrate that dynamically adjusting the perturbation strength to match the evolution of the loss landscape improves optimizing for these perturbed objectives.




Abstract:Energy-based learning algorithms have recently gained a surge of interest due to their compatibility with analog (post-digital) hardware. Existing algorithms include contrastive learning (CL), equilibrium propagation (EP) and coupled learning (CpL), all consisting in contrasting two states, and differing in the type of perturbation used to obtain the second state from the first one. However, these algorithms have never been explicitly compared on equal footing with same models and datasets, making it difficult to assess their scalability and decide which one to select in practice. In this work, we carry out a comparison of seven learning algorithms, namely CL and different variants of EP and CpL depending on the signs of the perturbations. Specifically, using these learning algorithms, we train deep convolutional Hopfield networks (DCHNs) on five vision tasks (MNIST, F-MNIST, SVHN, CIFAR-10 and CIFAR-100). We find that, while all algorithms yield comparable performance on MNIST, important differences in performance arise as the difficulty of the task increases. Our key findings reveal that negative perturbations are better than positive ones, and highlight the centered variant of EP (which uses two perturbations of opposite sign) as the best-performing algorithm. We also endorse these findings with theoretical arguments. Additionally, we establish new SOTA results with DCHNs on all five datasets, both in performance and speed. In particular, our DCHN simulations are 13.5 times faster with respect to Laborieux et al. (2021), which we achieve thanks to the use of a novel energy minimisation algorithm based on asynchronous updates, combined with reduced precision (16 bits).