Picture for Nicolas Zucchet

Nicolas Zucchet

The emergence of sparse attention: impact of data distribution and benefits of repetition

Add code
May 23, 2025
Viaarxiv icon

How do language models learn facts? Dynamics, curricula and hallucinations

Add code
Mar 27, 2025
Viaarxiv icon

Recurrent neural networks: vanishing and exploding gradients are not the end of the story

Add code
May 31, 2024
Viaarxiv icon

Uncovering mesa-optimization algorithms in Transformers

Add code
Sep 11, 2023
Figure 1 for Uncovering mesa-optimization algorithms in Transformers
Figure 2 for Uncovering mesa-optimization algorithms in Transformers
Figure 3 for Uncovering mesa-optimization algorithms in Transformers
Figure 4 for Uncovering mesa-optimization algorithms in Transformers
Viaarxiv icon

Gated recurrent neural networks discover attention

Add code
Sep 04, 2023
Viaarxiv icon

Online learning of long-range dependencies

Add code
May 25, 2023
Viaarxiv icon

Random initialisations performing above chance and how to find them

Add code
Sep 15, 2022
Figure 1 for Random initialisations performing above chance and how to find them
Figure 2 for Random initialisations performing above chance and how to find them
Figure 3 for Random initialisations performing above chance and how to find them
Figure 4 for Random initialisations performing above chance and how to find them
Viaarxiv icon

The least-control principle for learning at equilibrium

Add code
Jul 04, 2022
Figure 1 for The least-control principle for learning at equilibrium
Figure 2 for The least-control principle for learning at equilibrium
Figure 3 for The least-control principle for learning at equilibrium
Figure 4 for The least-control principle for learning at equilibrium
Viaarxiv icon

Beyond backpropagation: implicit gradients for bilevel optimization

Add code
May 06, 2022
Figure 1 for Beyond backpropagation: implicit gradients for bilevel optimization
Figure 2 for Beyond backpropagation: implicit gradients for bilevel optimization
Figure 3 for Beyond backpropagation: implicit gradients for bilevel optimization
Viaarxiv icon

Learning where to learn: Gradient sparsity in meta and continual learning

Add code
Oct 27, 2021
Figure 1 for Learning where to learn: Gradient sparsity in meta and continual learning
Figure 2 for Learning where to learn: Gradient sparsity in meta and continual learning
Figure 3 for Learning where to learn: Gradient sparsity in meta and continual learning
Figure 4 for Learning where to learn: Gradient sparsity in meta and continual learning
Viaarxiv icon