Picture for Etai Littwin

Etai Littwin

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

Add code
Dec 07, 2023
Viaarxiv icon

Vanishing Gradients in Reinforcement Finetuning of Language Models

Oct 31, 2023
Viaarxiv icon

What Algorithms can Transformers Learn? A Study in Length Generalization

Add code
Oct 24, 2023
Viaarxiv icon

When can transformers reason with abstract symbols?

Add code
Oct 15, 2023
Figure 1 for When can transformers reason with abstract symbols?
Figure 2 for When can transformers reason with abstract symbols?
Figure 3 for When can transformers reason with abstract symbols?
Figure 4 for When can transformers reason with abstract symbols?
Viaarxiv icon

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Oct 13, 2023
Figure 1 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 2 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 3 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 4 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Viaarxiv icon

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

Aug 07, 2023
Figure 1 for Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit
Figure 2 for Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit
Figure 3 for Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit
Figure 4 for Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit
Viaarxiv icon

Transformers learn through gradual rank increase

Jun 12, 2023
Figure 1 for Transformers learn through gradual rank increase
Figure 2 for Transformers learn through gradual rank increase
Figure 3 for Transformers learn through gradual rank increase
Figure 4 for Transformers learn through gradual rank increase
Viaarxiv icon

The NTK approximation is valid for longer than you think

May 22, 2023
Viaarxiv icon

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Add code
Mar 11, 2023
Figure 1 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 2 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 3 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 4 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Viaarxiv icon

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

Add code
Jun 13, 2022
Figure 1 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 2 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 3 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 4 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Viaarxiv icon