Picture for Johannes von Oswald

Johannes von Oswald

Institute of Neuroinformatics, ETH Zürich and University of Zürich, Zürich, Switzerland

Discovering modular solutions that generalize compositionally

Add code
Dec 22, 2023
Figure 1 for Discovering modular solutions that generalize compositionally
Figure 2 for Discovering modular solutions that generalize compositionally
Figure 3 for Discovering modular solutions that generalize compositionally
Figure 4 for Discovering modular solutions that generalize compositionally
Viaarxiv icon

Uncovering mesa-optimization algorithms in Transformers

Add code
Sep 11, 2023
Figure 1 for Uncovering mesa-optimization algorithms in Transformers
Figure 2 for Uncovering mesa-optimization algorithms in Transformers
Figure 3 for Uncovering mesa-optimization algorithms in Transformers
Figure 4 for Uncovering mesa-optimization algorithms in Transformers
Viaarxiv icon

Gated recurrent neural networks discover attention

Add code
Sep 04, 2023
Figure 1 for Gated recurrent neural networks discover attention
Figure 2 for Gated recurrent neural networks discover attention
Figure 3 for Gated recurrent neural networks discover attention
Figure 4 for Gated recurrent neural networks discover attention
Viaarxiv icon

Transformers learn in-context by gradient descent

Add code
Dec 15, 2022
Figure 1 for Transformers learn in-context by gradient descent
Figure 2 for Transformers learn in-context by gradient descent
Figure 3 for Transformers learn in-context by gradient descent
Figure 4 for Transformers learn in-context by gradient descent
Viaarxiv icon

Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

Add code
Oct 18, 2022
Figure 1 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 2 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 3 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Figure 4 for Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Viaarxiv icon

Random initialisations performing above chance and how to find them

Add code
Sep 15, 2022
Figure 1 for Random initialisations performing above chance and how to find them
Figure 2 for Random initialisations performing above chance and how to find them
Figure 3 for Random initialisations performing above chance and how to find them
Figure 4 for Random initialisations performing above chance and how to find them
Viaarxiv icon

The least-control principle for learning at equilibrium

Add code
Jul 04, 2022
Figure 1 for The least-control principle for learning at equilibrium
Figure 2 for The least-control principle for learning at equilibrium
Figure 3 for The least-control principle for learning at equilibrium
Figure 4 for The least-control principle for learning at equilibrium
Viaarxiv icon

Learning where to learn: Gradient sparsity in meta and continual learning

Add code
Oct 27, 2021
Figure 1 for Learning where to learn: Gradient sparsity in meta and continual learning
Figure 2 for Learning where to learn: Gradient sparsity in meta and continual learning
Figure 3 for Learning where to learn: Gradient sparsity in meta and continual learning
Figure 4 for Learning where to learn: Gradient sparsity in meta and continual learning
Viaarxiv icon

A contrastive rule for meta-learning

Add code
Apr 19, 2021
Figure 1 for A contrastive rule for meta-learning
Figure 2 for A contrastive rule for meta-learning
Figure 3 for A contrastive rule for meta-learning
Figure 4 for A contrastive rule for meta-learning
Viaarxiv icon

Posterior Meta-Replay for Continual Learning

Add code
Mar 01, 2021
Figure 1 for Posterior Meta-Replay for Continual Learning
Figure 2 for Posterior Meta-Replay for Continual Learning
Figure 3 for Posterior Meta-Replay for Continual Learning
Figure 4 for Posterior Meta-Replay for Continual Learning
Viaarxiv icon