Picture for Etai Littwin

Etai Littwin

Transformers learn through gradual rank increase

Add code
Jun 12, 2023
Figure 1 for Transformers learn through gradual rank increase
Figure 2 for Transformers learn through gradual rank increase
Figure 3 for Transformers learn through gradual rank increase
Figure 4 for Transformers learn through gradual rank increase
Viaarxiv icon

The NTK approximation is valid for longer than you think

Add code
May 22, 2023
Viaarxiv icon

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Add code
Mar 11, 2023
Viaarxiv icon

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

Add code
Jun 13, 2022
Figure 1 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 2 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 3 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 4 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Viaarxiv icon

Learning Representation from Neural Fisher Kernel with Low-rank Approximation

Add code
Feb 04, 2022
Figure 1 for Learning Representation from Neural Fisher Kernel with Low-rank Approximation
Figure 2 for Learning Representation from Neural Fisher Kernel with Low-rank Approximation
Figure 3 for Learning Representation from Neural Fisher Kernel with Low-rank Approximation
Figure 4 for Learning Representation from Neural Fisher Kernel with Low-rank Approximation
Viaarxiv icon

Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks

Add code
Jul 02, 2021
Figure 1 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 2 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 3 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 4 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Viaarxiv icon

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

Add code
Jul 02, 2021
Figure 1 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Figure 2 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Figure 3 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Figure 4 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Viaarxiv icon

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

Add code
May 08, 2021
Figure 1 for Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics
Figure 2 for Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics
Figure 3 for Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics
Figure 4 for Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics
Viaarxiv icon

Collegial Ensembles

Add code
Jun 17, 2020
Figure 1 for Collegial Ensembles
Figure 2 for Collegial Ensembles
Figure 3 for Collegial Ensembles
Figure 4 for Collegial Ensembles
Viaarxiv icon

On the Optimization Dynamics of Wide Hypernetworks

Add code
Apr 05, 2020
Viaarxiv icon