Picture for Suvrit Sra

Suvrit Sra

MIT

First-Order Methods for Linearly Constrained Bilevel Optimization

Add code
Jun 18, 2024
Viaarxiv icon

Riemannian Bilevel Optimization

Add code
May 22, 2024
Viaarxiv icon

Efficient Sampling on Riemannian Manifolds via Langevin MCMC

Add code
Feb 15, 2024
Figure 1 for Efficient Sampling on Riemannian Manifolds via Langevin MCMC
Viaarxiv icon

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Add code
Dec 26, 2023
Figure 1 for Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Figure 2 for Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Figure 3 for Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Figure 4 for Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Viaarxiv icon

Linear attention is (maybe) all you need (to understand transformer optimization)

Add code
Oct 02, 2023
Figure 1 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 2 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 3 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 4 for Linear attention is (maybe) all you need (to understand transformer optimization)
Viaarxiv icon

Invex Programs: First Order Algorithms and Their Convergence

Add code
Jul 10, 2023
Figure 1 for Invex Programs: First Order Algorithms and Their Convergence
Figure 2 for Invex Programs: First Order Algorithms and Their Convergence
Figure 3 for Invex Programs: First Order Algorithms and Their Convergence
Figure 4 for Invex Programs: First Order Algorithms and Their Convergence
Viaarxiv icon

Transformers learn to implement preconditioned gradient descent for in-context learning

Add code
Jun 01, 2023
Figure 1 for Transformers learn to implement preconditioned gradient descent for in-context learning
Figure 2 for Transformers learn to implement preconditioned gradient descent for in-context learning
Figure 3 for Transformers learn to implement preconditioned gradient descent for in-context learning
Figure 4 for Transformers learn to implement preconditioned gradient descent for in-context learning
Viaarxiv icon

How to escape sharp minima

Add code
May 25, 2023
Figure 1 for How to escape sharp minima
Figure 2 for How to escape sharp minima
Viaarxiv icon

The Crucial Role of Normalization in Sharpness-Aware Minimization

Add code
May 24, 2023
Figure 1 for The Crucial Role of Normalization in Sharpness-Aware Minimization
Figure 2 for The Crucial Role of Normalization in Sharpness-Aware Minimization
Figure 3 for The Crucial Role of Normalization in Sharpness-Aware Minimization
Figure 4 for The Crucial Role of Normalization in Sharpness-Aware Minimization
Viaarxiv icon

On the Training Instability of Shuffling SGD with Batch Normalization

Add code
Feb 24, 2023
Figure 1 for On the Training Instability of Shuffling SGD with Batch Normalization
Figure 2 for On the Training Instability of Shuffling SGD with Batch Normalization
Figure 3 for On the Training Instability of Shuffling SGD with Batch Normalization
Figure 4 for On the Training Instability of Shuffling SGD with Batch Normalization
Viaarxiv icon