Picture for Simon Schug

Simon Schug

Sparsely gated tiny linear experts

Add code
Jun 05, 2026
Viaarxiv icon

Scale leads to compositional generalization

Add code
Jul 09, 2025
Viaarxiv icon

When can transformers compositionally generalize in-context?

Add code
Jul 17, 2024
Figure 1 for When can transformers compositionally generalize in-context?
Figure 2 for When can transformers compositionally generalize in-context?
Figure 3 for When can transformers compositionally generalize in-context?
Figure 4 for When can transformers compositionally generalize in-context?
Viaarxiv icon

Attention as a Hypernetwork

Add code
Jun 09, 2024
Figure 1 for Attention as a Hypernetwork
Figure 2 for Attention as a Hypernetwork
Figure 3 for Attention as a Hypernetwork
Figure 4 for Attention as a Hypernetwork
Viaarxiv icon

Discovering modular solutions that generalize compositionally

Add code
Dec 22, 2023
Figure 1 for Discovering modular solutions that generalize compositionally
Figure 2 for Discovering modular solutions that generalize compositionally
Figure 3 for Discovering modular solutions that generalize compositionally
Figure 4 for Discovering modular solutions that generalize compositionally
Viaarxiv icon

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

Add code
Jun 29, 2023
Figure 1 for Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Figure 2 for Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Figure 3 for Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Figure 4 for Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
Viaarxiv icon

Online learning of long-range dependencies

Add code
May 25, 2023
Viaarxiv icon

Random initialisations performing above chance and how to find them

Add code
Sep 15, 2022
Figure 1 for Random initialisations performing above chance and how to find them
Figure 2 for Random initialisations performing above chance and how to find them
Figure 3 for Random initialisations performing above chance and how to find them
Figure 4 for Random initialisations performing above chance and how to find them
Viaarxiv icon

Learning where to learn: Gradient sparsity in meta and continual learning

Add code
Oct 27, 2021
Figure 1 for Learning where to learn: Gradient sparsity in meta and continual learning
Figure 2 for Learning where to learn: Gradient sparsity in meta and continual learning
Figure 3 for Learning where to learn: Gradient sparsity in meta and continual learning
Figure 4 for Learning where to learn: Gradient sparsity in meta and continual learning
Viaarxiv icon

A contrastive rule for meta-learning

Add code
Apr 19, 2021
Figure 1 for A contrastive rule for meta-learning
Figure 2 for A contrastive rule for meta-learning
Figure 3 for A contrastive rule for meta-learning
Figure 4 for A contrastive rule for meta-learning
Viaarxiv icon