Picture for Omid Saremi

Omid Saremi

Annotations Mitigate Post-Training Mode Collapse

Add code
May 11, 2026
Viaarxiv icon

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Add code
Jul 03, 2024
Figure 1 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Figure 2 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Figure 3 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Figure 4 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Viaarxiv icon

How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad

Add code
Jun 10, 2024
Figure 1 for How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
Figure 2 for How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
Figure 3 for How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
Figure 4 for How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
Viaarxiv icon

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

Add code
Dec 07, 2023
Figure 1 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Figure 2 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Figure 3 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Figure 4 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Viaarxiv icon

Vanishing Gradients in Reinforcement Finetuning of Language Models

Add code
Oct 31, 2023
Figure 1 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 2 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 3 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 4 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Viaarxiv icon

What Algorithms can Transformers Learn? A Study in Length Generalization

Add code
Oct 24, 2023
Figure 1 for What Algorithms can Transformers Learn? A Study in Length Generalization
Figure 2 for What Algorithms can Transformers Learn? A Study in Length Generalization
Figure 3 for What Algorithms can Transformers Learn? A Study in Length Generalization
Figure 4 for What Algorithms can Transformers Learn? A Study in Length Generalization
Viaarxiv icon

When can transformers reason with abstract symbols?

Add code
Oct 15, 2023
Figure 1 for When can transformers reason with abstract symbols?
Figure 2 for When can transformers reason with abstract symbols?
Figure 3 for When can transformers reason with abstract symbols?
Figure 4 for When can transformers reason with abstract symbols?
Viaarxiv icon

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Add code
Oct 13, 2023
Figure 1 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 2 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 3 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 4 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Viaarxiv icon

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

Add code
Jun 13, 2022
Figure 1 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 2 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 3 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 4 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Viaarxiv icon

Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks

Add code
Jul 02, 2021
Figure 1 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 2 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 3 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 4 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Viaarxiv icon