Picture for Atish Agarwala

Atish Agarwala

What do near-optimal learning rate schedules look like?

Add code
Mar 11, 2026
Viaarxiv icon

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Add code
Jul 02, 2025
Viaarxiv icon

How far away are truly hyperparameter-free learning algorithms?

Add code
May 29, 2025
Viaarxiv icon

Avoiding spurious sharpness minimization broadens applicability of SAM

Add code
Feb 04, 2025
Figure 1 for Avoiding spurious sharpness minimization broadens applicability of SAM
Figure 2 for Avoiding spurious sharpness minimization broadens applicability of SAM
Figure 3 for Avoiding spurious sharpness minimization broadens applicability of SAM
Figure 4 for Avoiding spurious sharpness minimization broadens applicability of SAM
Viaarxiv icon

Exact Risk Curves of signSGD in High-Dimensions: Quantifying Preconditioning and Noise-Compression Effects

Add code
Nov 19, 2024
Figure 1 for Exact Risk Curves of signSGD in High-Dimensions: Quantifying Preconditioning and Noise-Compression Effects
Figure 2 for Exact Risk Curves of signSGD in High-Dimensions: Quantifying Preconditioning and Noise-Compression Effects
Figure 3 for Exact Risk Curves of signSGD in High-Dimensions: Quantifying Preconditioning and Noise-Compression Effects
Figure 4 for Exact Risk Curves of signSGD in High-Dimensions: Quantifying Preconditioning and Noise-Compression Effects
Viaarxiv icon

Stepping on the Edge: Curvature Aware Learning Rate Tuners

Add code
Jul 08, 2024
Figure 1 for Stepping on the Edge: Curvature Aware Learning Rate Tuners
Figure 2 for Stepping on the Edge: Curvature Aware Learning Rate Tuners
Figure 3 for Stepping on the Edge: Curvature Aware Learning Rate Tuners
Figure 4 for Stepping on the Edge: Curvature Aware Learning Rate Tuners
Viaarxiv icon

A Clipped Trip: the Dynamics of SGD with Gradient Clipping in High-Dimensions

Add code
Jun 17, 2024
Viaarxiv icon

High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

Add code
Apr 30, 2024
Viaarxiv icon

Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks

Add code
Feb 07, 2024
Figure 1 for Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks
Figure 2 for Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks
Figure 3 for Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks
Figure 4 for Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks
Viaarxiv icon

Neglected Hessian component explains mysteries in Sharpness regularization

Add code
Jan 24, 2024
Viaarxiv icon