Alert button
Picture for Etai Littwin

Etai Littwin

Alert button

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

Add code
Bookmark button
Alert button
Dec 07, 2023
Vimal Thilak, Chen Huang, Omid Saremi, Laurent Dinh, Hanlin Goh, Preetum Nakkiran, Joshua M. Susskind, Etai Littwin

Viaarxiv icon

Vanishing Gradients in Reinforcement Finetuning of Language Models

Add code
Bookmark button
Alert button
Oct 31, 2023
Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin

Viaarxiv icon

What Algorithms can Transformers Learn? A Study in Length Generalization

Add code
Bookmark button
Alert button
Oct 24, 2023
Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

Viaarxiv icon

When can transformers reason with abstract symbols?

Add code
Bookmark button
Alert button
Oct 15, 2023
Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind

Figure 1 for When can transformers reason with abstract symbols?
Figure 2 for When can transformers reason with abstract symbols?
Figure 3 for When can transformers reason with abstract symbols?
Figure 4 for When can transformers reason with abstract symbols?
Viaarxiv icon

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Add code
Bookmark button
Alert button
Oct 13, 2023
Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio

Figure 1 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 2 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 3 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 4 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Viaarxiv icon

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

Add code
Bookmark button
Alert button
Aug 07, 2023
Greg Yang, Etai Littwin

Viaarxiv icon

Transformers learn through gradual rank increase

Add code
Bookmark button
Alert button
Jun 12, 2023
Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind

Figure 1 for Transformers learn through gradual rank increase
Figure 2 for Transformers learn through gradual rank increase
Figure 3 for Transformers learn through gradual rank increase
Figure 4 for Transformers learn through gradual rank increase
Viaarxiv icon

The NTK approximation is valid for longer than you think

Add code
Bookmark button
Alert button
May 22, 2023
Enric Boix-Adsera, Etai Littwin

Viaarxiv icon

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Add code
Bookmark button
Alert button
Mar 11, 2023
Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind

Figure 1 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 2 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 3 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 4 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Viaarxiv icon