Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Benjamin F. Grewe

Institute of Neuroinformatics, ETH Zürich and University of Zürich, Zürich, Switzerland

Minimizing Control for Credit Assignment with Strong Feedback

Apr 14, 2022

Alexander Meulemans, Matilde Tristany Farinha, Maria R. Cervera, João Sacramento, Benjamin F. Grewe

Figure 1 for Minimizing Control for Credit Assignment with Strong Feedback

Figure 2 for Minimizing Control for Credit Assignment with Strong Feedback

Figure 3 for Minimizing Control for Credit Assignment with Strong Feedback

Figure 4 for Minimizing Control for Credit Assignment with Strong Feedback

Abstract:The success of deep learning attracted interest in whether the brain learns hierarchical representations using gradient-based learning. However, current biologically plausible methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals, which is problematic in biologically realistic noisy environments and at odds with experimental evidence in neuroscience showing that top-down feedback can significantly influence neural activity. Building upon deep feedback control (DFC), a recently proposed credit assignment method, we combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization. Instead of gradually changing the network weights towards configurations with low output loss, weight updates gradually minimize the amount of feedback required from a controller that drives the network to the supervised output label. Moreover, we show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using a learning rule fully local in space and time. We complement our theoretical results with experiments on standard computer-vision benchmarks, showing competitive performance to backpropagation as well as robustness to noise. Overall, our work presents a fundamentally novel view of learning as control minimization, while sidestepping biologically unrealistic assumptions.

* 25 pages, 3 figures

Via

Access Paper or Ask Questions

Uncertainty estimation under model misspecification in neural network regression

Nov 23, 2021

Maria R. Cervera, Rafael Dätwyler, Francesco D'Angelo, Hamza Keurti, Benjamin F. Grewe, Christian Henning

Figure 1 for Uncertainty estimation under model misspecification in neural network regression

Figure 2 for Uncertainty estimation under model misspecification in neural network regression

Figure 3 for Uncertainty estimation under model misspecification in neural network regression

Abstract:Although neural networks are powerful function approximators, the underlying modelling assumptions ultimately define the likelihood and thus the hypothesis class they are parameterizing. In classification, these assumptions are minimal as the commonly employed softmax is capable of representing any categorical distribution. In regression, however, restrictive assumptions on the type of continuous distribution to be realized are typically placed, like the dominant choice of training via mean-squared error and its underlying Gaussianity assumption. Recently, modelling advances allow to be agnostic to the type of continuous distribution to be modelled, granting regression the flexibility of classification models. While past studies stress the benefit of such flexible regression models in terms of performance, here we study the effect of the model choice on uncertainty estimation. We highlight that under model misspecification, aleatoric uncertainty is not properly captured, and that a Bayesian treatment of a misspecified model leads to unreliable epistemic uncertainty estimates. Overall, our study provides an overview on how modelling choices in regression may influence uncertainty estimation and thus any downstream decision making process.

* Published at the NeurIPS 2021 workshop "Your Model Is Wrong: Robustness and Misspecification in Probabilistic Modeling"

Via

Access Paper or Ask Questions

Are Bayesian neural networks intrinsically good at out-of-distribution detection?

Jul 26, 2021

Christian Henning, Francesco D'Angelo, Benjamin F. Grewe

Figure 1 for Are Bayesian neural networks intrinsically good at out-of-distribution detection?

Figure 2 for Are Bayesian neural networks intrinsically good at out-of-distribution detection?

Abstract:The need to avoid confident predictions on unfamiliar data has sparked interest in out-of-distribution (OOD) detection. It is widely assumed that Bayesian neural networks (BNN) are well suited for this task, as the endowed epistemic uncertainty should lead to disagreement in predictions on outliers. In this paper, we question this assumption and provide empirical evidence that proper Bayesian inference with common neural network architectures does not necessarily lead to good OOD detection. To circumvent the use of approximate inference, we start by studying the infinite-width case, where Bayesian inference can be exact considering the corresponding Gaussian process. Strikingly, the kernels induced under common architectural choices lead to uncertainties that do not reflect the underlying data generating process and are therefore unsuited for OOD detection. Finally, we study finite-width networks using HMC, and observe OOD behavior that is consistent with the infinite-width case. Overall, our study discloses fundamental problems when naively using BNNs for OOD detection and opens interesting avenues for future research.

* Published at UDL Workshop, ICML 2021

Via

Access Paper or Ask Questions

Credit Assignment in Neural Networks through Deep Feedback Control

Jun 15, 2021

Alexander Meulemans, Matilde Tristany Farinha, Javier García Ordóñez, Pau Vilimelis Aceituno, João Sacramento, Benjamin F. Grewe

Figure 1 for Credit Assignment in Neural Networks through Deep Feedback Control

Figure 2 for Credit Assignment in Neural Networks through Deep Feedback Control

Figure 3 for Credit Assignment in Neural Networks through Deep Feedback Control

Figure 4 for Credit Assignment in Neural Networks through Deep Feedback Control

Abstract:The success of deep learning sparked interest in whether the brain learns by using similar techniques for assigning credit to each synaptic weight for its contribution to the network output. However, the majority of current attempts at biologically-plausible learning methods are either non-local in time, require highly specific connectivity motives, or have no clear link to any known mathematical optimization method. Here, we introduce Deep Feedback Control (DFC), a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment. The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of feedback connectivity patterns. To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing. By combining dynamical system theory with mathematical optimization theory, we provide a strong theoretical foundation for DFC that we corroborate with detailed results on toy experiments and standard computer-vision benchmarks.

* 14 pages and 3 figures in the main manuscript; 45 pages and 14 figures in the supplementary materials

Via

Access Paper or Ask Questions

Posterior Meta-Replay for Continual Learning

Mar 01, 2021

Christian Henning, Maria R. Cervera, Francesco D'Angelo, Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, João Sacramento, Benjamin F. Grewe

Figure 1 for Posterior Meta-Replay for Continual Learning

Figure 2 for Posterior Meta-Replay for Continual Learning

Figure 3 for Posterior Meta-Replay for Continual Learning

Figure 4 for Posterior Meta-Replay for Continual Learning

Abstract:Continual Learning (CL) algorithms have recently received a lot of attention as they attempt to overcome the need to train with an i.i.d. sample from some unknown target data distribution. Building on prior work, we study principled ways to tackle the CL problem by adopting a Bayesian perspective and focus on continually learning a task-specific posterior distribution via a shared meta-model, a task-conditioned hypernetwork. This approach, which we term Posterior-replay CL, is in sharp contrast to most Bayesian CL approaches that focus on the recursive update of a single posterior distribution. The benefits of our approach are (1) an increased flexibility to model solutions in weight space and therewith less susceptibility to task dissimilarity, (2) access to principled task-specific predictive uncertainty estimates, that can be used to infer task identity during test time and to detect task boundaries during training, and (3) the ability to revisit and update task-specific posteriors in a principled manner without requiring access to past data. The proposed framework is versatile, which we demonstrate using simple posterior approximations (such as Gaussians) as well as powerful, implicit distributions modelled via a neural network. We illustrate the conceptual advance of our framework on low-dimensional problems and show performance gains on computer vision benchmarks.

Via

Access Paper or Ask Questions

Economical ensembles with hypernetworks

Jul 25, 2020

João Sacramento, Johannes von Oswald, Seijin Kobayashi, Christian Henning, Benjamin F. Grewe

Figure 1 for Economical ensembles with hypernetworks

Figure 2 for Economical ensembles with hypernetworks

Figure 3 for Economical ensembles with hypernetworks

Figure 4 for Economical ensembles with hypernetworks

Abstract:Averaging the predictions of many independently trained neural networks is a simple and effective way of improving generalization in deep learning. However, this strategy rapidly becomes costly, as the number of trainable parameters grows linearly with the size of the ensemble. Here, we propose a new method to learn economical ensembles, where the number of trainable parameters and iterations over the data is comparable to that of a single model. Our neural networks are parameterized by hypernetworks, which learn to embed weights in low-dimensional spaces. In a late training phase, we generate an ensemble by randomly initializing an additional number of weight embeddings in the vicinity of each other. We then exploit the inherent randomness in stochastic gradient descent to induce ensemble diversity. Experiments with wide residual networks on the CIFAR and Fashion-MNIST datasets show that our algorithm yields models that are more accurate and less overconfident on unseen data, while learning as efficiently as a single network.

* 25 pages, 5 figures

Via

Access Paper or Ask Questions

A Theoretical Framework for Target Propagation

Jun 25, 2020

Alexander Meulemans, Francesco S. Carzaniga, Johan A. K. Suykens, João Sacramento, Benjamin F. Grewe

Figure 1 for A Theoretical Framework for Target Propagation

Figure 2 for A Theoretical Framework for Target Propagation

Figure 3 for A Theoretical Framework for Target Propagation

Figure 4 for A Theoretical Framework for Target Propagation

Abstract:The success of deep learning, a brain-inspired form of AI, has sparked interest in understanding how the brain could similarly learn across multiple layers of neurons. However, the majority of biologically-plausible learning algorithms have not yet reached the performance of backpropagation (BP), nor are they built on strong theoretical foundations. Here, we analyze target propagation (TP), a popular but not yet fully understood alternative to BP, from the standpoint of mathematical optimization. Our theory shows that TP is closely related to Gauss-Newton optimization and thus substantially differs from BP. Furthermore, our analysis reveals a fundamental limitation of difference target propagation (DTP), a well-known variant of TP, in the realistic scenario of non-invertible neural networks. We provide a first solution to this problem through a novel reconstruction loss that improves feedback weight training, while simultaneously introducing architectural flexibility by allowing for direct feedback connections from the output to each hidden layer. Our theory is corroborated by experimental results that show significant improvements in performance and in the alignment of forward weight updates with loss gradients, compared to DTP.

* 12 pages and 4 figures in main manuscript; 38 pages and 6 figures in supplementary material

Via

Access Paper or Ask Questions

Continual Learning in Recurrent Neural Networks with Hypernetworks

Jun 22, 2020

Benjamin Ehret, Christian Henning, Maria R. Cervera, Alexander Meulemans, Johannes von Oswald, Benjamin F. Grewe

Figure 1 for Continual Learning in Recurrent Neural Networks with Hypernetworks

Figure 2 for Continual Learning in Recurrent Neural Networks with Hypernetworks

Figure 3 for Continual Learning in Recurrent Neural Networks with Hypernetworks

Figure 4 for Continual Learning in Recurrent Neural Networks with Hypernetworks

Abstract:The last decade has seen a surge of interest in continual learning (CL), and a variety of methods have been developed to alleviate catastrophic forgetting. However, most prior work has focused on tasks with static data, while CL on sequential data has remained largely unexplored. Here we address this gap in two ways. First, we evaluate the performance of established CL methods when applied to recurrent neural networks (RNNs). We primarily focus on elastic weight consolidation, which is limited by a stability-plasticity trade-off, and explore the particularities of this trade-off when using sequential data. We show that high working memory requirements, but not necessarily sequence length, lead to an increased need for stability at the cost of decreased performance on subsequent tasks. Second, to overcome this limitation we employ a recent method based on hypernetworks and apply it to RNNs to address catastrophic forgetting on sequential data. By generating the weights of a main RNN in a task-dependent manner, our approach disentangles stability and plasticity, and outperforms alternative methods in a range of experiments. Overall, our work provides several key insights on the differences between CL in feedforward networks and in RNNs, while offering a novel solution to effectively tackle CL on sequential data.

* 13 pages and 4 figures in the main text; 20 pages and 2 figures in the supplementary materials

Via

Access Paper or Ask Questions

Continual learning with hypernetworks

Jun 03, 2019

Johannes von Oswald, Christian Henning, João Sacramento, Benjamin F. Grewe

Figure 1 for Continual learning with hypernetworks

Figure 2 for Continual learning with hypernetworks

Figure 3 for Continual learning with hypernetworks

Figure 4 for Continual learning with hypernetworks

Abstract:Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key observation: instead of relying on recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing previous weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving good performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display an unprecedented capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning properties. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets.

Via

Access Paper or Ask Questions