Picture for Andrew Saxe

Andrew Saxe

Optimal Representation Size: High-Dimensional Analysis of Pretraining and Linear Probing

Add code
May 19, 2026
Viaarxiv icon

Optimal Learning Rate Schedule for Balancing Effort and Performance

Add code
Jan 12, 2026
Viaarxiv icon

Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures

Add code
Dec 23, 2025
Viaarxiv icon

Revisiting the Role of Relearning in Semantic Dementia

Add code
Mar 05, 2025
Viaarxiv icon

Training Dynamics of In-Context Learning in Linear Attention

Add code
Jan 27, 2025
Viaarxiv icon

Early learning of the optimal constant solution in neural networks and humans

Add code
Jun 25, 2024
Figure 1 for Early learning of the optimal constant solution in neural networks and humans
Figure 2 for Early learning of the optimal constant solution in neural networks and humans
Figure 3 for Early learning of the optimal constant solution in neural networks and humans
Figure 4 for Early learning of the optimal constant solution in neural networks and humans
Viaarxiv icon

When Are Bias-Free ReLU Networks Like Linear Networks?

Add code
Jun 18, 2024
Viaarxiv icon

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

Add code
Jun 10, 2024
Figure 1 for Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
Figure 2 for Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
Figure 3 for Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
Figure 4 for Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
Viaarxiv icon

Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks

Add code
Jun 03, 2024
Figure 1 for Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks
Figure 2 for Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks
Figure 3 for Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks
Figure 4 for Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks
Viaarxiv icon

A Theory of Unimodal Bias in Multimodal Learning

Add code
Dec 01, 2023
Viaarxiv icon