Alert button
Picture for Suvrit Sra

Suvrit Sra

Alert button

Efficient Sampling on Riemannian Manifolds via Langevin MCMC

Add code
Bookmark button
Alert button
Feb 15, 2024
Xiang Cheng, Jingzhao Zhang, Suvrit Sra

Viaarxiv icon

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Add code
Bookmark button
Alert button
Dec 26, 2023
Xiang Cheng, Yuxin Chen, Suvrit Sra

Figure 1 for Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Figure 2 for Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Figure 3 for Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Figure 4 for Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Viaarxiv icon

Linear attention is (maybe) all you need (to understand transformer optimization)

Add code
Bookmark button
Alert button
Oct 02, 2023
Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra

Figure 1 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 2 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 3 for Linear attention is (maybe) all you need (to understand transformer optimization)
Figure 4 for Linear attention is (maybe) all you need (to understand transformer optimization)
Viaarxiv icon

Invex Programs: First Order Algorithms and Their Convergence

Add code
Bookmark button
Alert button
Jul 10, 2023
Adarsh Barik, Suvrit Sra, Jean Honorio

Figure 1 for Invex Programs: First Order Algorithms and Their Convergence
Figure 2 for Invex Programs: First Order Algorithms and Their Convergence
Figure 3 for Invex Programs: First Order Algorithms and Their Convergence
Figure 4 for Invex Programs: First Order Algorithms and Their Convergence
Viaarxiv icon

Transformers learn to implement preconditioned gradient descent for in-context learning

Add code
Bookmark button
Alert button
Jun 01, 2023
Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, Suvrit Sra

Figure 1 for Transformers learn to implement preconditioned gradient descent for in-context learning
Figure 2 for Transformers learn to implement preconditioned gradient descent for in-context learning
Figure 3 for Transformers learn to implement preconditioned gradient descent for in-context learning
Figure 4 for Transformers learn to implement preconditioned gradient descent for in-context learning
Viaarxiv icon

How to escape sharp minima

Add code
Bookmark button
Alert button
May 25, 2023
Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra

Figure 1 for How to escape sharp minima
Figure 2 for How to escape sharp minima
Viaarxiv icon

The Crucial Role of Normalization in Sharpness-Aware Minimization

Add code
Bookmark button
Alert button
May 24, 2023
Yan Dai, Kwangjun Ahn, Suvrit Sra

Figure 1 for The Crucial Role of Normalization in Sharpness-Aware Minimization
Figure 2 for The Crucial Role of Normalization in Sharpness-Aware Minimization
Figure 3 for The Crucial Role of Normalization in Sharpness-Aware Minimization
Figure 4 for The Crucial Role of Normalization in Sharpness-Aware Minimization
Viaarxiv icon

On the Training Instability of Shuffling SGD with Batch Normalization

Add code
Bookmark button
Alert button
Feb 24, 2023
David X. Wu, Chulhee Yun, Suvrit Sra

Figure 1 for On the Training Instability of Shuffling SGD with Batch Normalization
Figure 2 for On the Training Instability of Shuffling SGD with Batch Normalization
Figure 3 for On the Training Instability of Shuffling SGD with Batch Normalization
Figure 4 for On the Training Instability of Shuffling SGD with Batch Normalization
Viaarxiv icon

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

Add code
Bookmark button
Alert button
Dec 30, 2022
Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

Viaarxiv icon