Picture for Vimal Thilak

Vimal Thilak

Path-Constrained Mixture-of-Experts

Add code
Mar 18, 2026
Viaarxiv icon

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Add code
Jan 21, 2025
Viaarxiv icon

Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning

Add code
Oct 14, 2024
Figure 1 for Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning
Figure 2 for Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning
Figure 3 for Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning
Figure 4 for Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning
Viaarxiv icon

Towards Automatic Assessment of Self-Supervised Speech Models using Rank

Add code
Sep 16, 2024
Figure 1 for Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Figure 2 for Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Figure 3 for Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Figure 4 for Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Viaarxiv icon

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Add code
Jul 03, 2024
Figure 1 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Figure 2 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Figure 3 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Figure 4 for How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Viaarxiv icon

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

Add code
Dec 07, 2023
Figure 1 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Figure 2 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Figure 3 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Figure 4 for LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures
Viaarxiv icon

Vanishing Gradients in Reinforcement Finetuning of Language Models

Add code
Oct 31, 2023
Figure 1 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 2 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 3 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Figure 4 for Vanishing Gradients in Reinforcement Finetuning of Language Models
Viaarxiv icon

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Add code
Oct 13, 2023
Figure 1 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 2 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 3 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Figure 4 for Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Viaarxiv icon

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

Add code
Jun 13, 2022
Figure 1 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 2 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 3 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 4 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Viaarxiv icon

Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks

Add code
Jul 02, 2021
Figure 1 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 2 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 3 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Figure 4 for Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
Viaarxiv icon