Picture for Enric Boix-Adsera

Enric Boix-Adsera

Towards a theory of model distillation

Add code
Mar 14, 2024
Figure 1 for Towards a theory of model distillation
Figure 2 for Towards a theory of model distillation
Figure 3 for Towards a theory of model distillation
Figure 4 for Towards a theory of model distillation
Viaarxiv icon

PROPANE: Prompt design as an inverse problem

Add code
Nov 13, 2023
Viaarxiv icon

When can transformers reason with abstract symbols?

Add code
Oct 15, 2023
Figure 1 for When can transformers reason with abstract symbols?
Figure 2 for When can transformers reason with abstract symbols?
Figure 3 for When can transformers reason with abstract symbols?
Figure 4 for When can transformers reason with abstract symbols?
Viaarxiv icon

Transformers learn through gradual rank increase

Add code
Jun 12, 2023
Figure 1 for Transformers learn through gradual rank increase
Figure 2 for Transformers learn through gradual rank increase
Figure 3 for Transformers learn through gradual rank increase
Figure 4 for Transformers learn through gradual rank increase
Viaarxiv icon

The NTK approximation is valid for longer than you think

Add code
May 22, 2023
Viaarxiv icon

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

Add code
Feb 21, 2023
Figure 1 for SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
Figure 2 for SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
Figure 3 for SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
Figure 4 for SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
Viaarxiv icon

GULP: a prediction-based metric between representations

Add code
Oct 12, 2022
Figure 1 for GULP: a prediction-based metric between representations
Figure 2 for GULP: a prediction-based metric between representations
Figure 3 for GULP: a prediction-based metric between representations
Figure 4 for GULP: a prediction-based metric between representations
Viaarxiv icon

On the non-universality of deep learning: quantifying the cost of symmetry

Add code
Aug 05, 2022
Figure 1 for On the non-universality of deep learning: quantifying the cost of symmetry
Figure 2 for On the non-universality of deep learning: quantifying the cost of symmetry
Figure 3 for On the non-universality of deep learning: quantifying the cost of symmetry
Viaarxiv icon

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

Add code
Feb 17, 2022
Figure 1 for The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks
Figure 2 for The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks
Figure 3 for The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks
Figure 4 for The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks
Viaarxiv icon

The staircase property: How hierarchical structure can guide deep learning

Add code
Aug 24, 2021
Figure 1 for The staircase property: How hierarchical structure can guide deep learning
Figure 2 for The staircase property: How hierarchical structure can guide deep learning
Figure 3 for The staircase property: How hierarchical structure can guide deep learning
Figure 4 for The staircase property: How hierarchical structure can guide deep learning
Viaarxiv icon