Picture for Atish Agarwala

Atish Agarwala

A Clipped Trip: the Dynamics of SGD with Gradient Clipping in High-Dimensions

Jun 17, 2024
Viaarxiv icon

High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

Apr 30, 2024
Viaarxiv icon

Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks

Feb 07, 2024
Viaarxiv icon

Neglected Hessian component explains mysteries in Sharpness regularization

Jan 24, 2024
Viaarxiv icon

On the Interplay Between Stepsize Tuning and Progressive Sharpening

Dec 07, 2023
Viaarxiv icon

SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

Feb 17, 2023
Figure 1 for SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Figure 2 for SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Figure 3 for SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Figure 4 for SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Viaarxiv icon

Second-order regression models exhibit progressive sharpening to the edge of stability

Oct 10, 2022
Figure 1 for Second-order regression models exhibit progressive sharpening to the edge of stability
Figure 2 for Second-order regression models exhibit progressive sharpening to the edge of stability
Figure 3 for Second-order regression models exhibit progressive sharpening to the edge of stability
Figure 4 for Second-order regression models exhibit progressive sharpening to the edge of stability
Viaarxiv icon

Deep equilibrium networks are sensitive to initialization statistics

Jul 19, 2022
Figure 1 for Deep equilibrium networks are sensitive to initialization statistics
Figure 2 for Deep equilibrium networks are sensitive to initialization statistics
Figure 3 for Deep equilibrium networks are sensitive to initialization statistics
Figure 4 for Deep equilibrium networks are sensitive to initialization statistics
Viaarxiv icon

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks

Mar 29, 2021
Figure 1 for One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Figure 2 for One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Figure 3 for One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Figure 4 for One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Viaarxiv icon

Temperature check: theory and practice for training models with softmax-cross-entropy losses

Oct 14, 2020
Figure 1 for Temperature check: theory and practice for training models with softmax-cross-entropy losses
Figure 2 for Temperature check: theory and practice for training models with softmax-cross-entropy losses
Figure 3 for Temperature check: theory and practice for training models with softmax-cross-entropy losses
Figure 4 for Temperature check: theory and practice for training models with softmax-cross-entropy losses
Viaarxiv icon