Picture for Maissam Barkeshli

Maissam Barkeshli

Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability

Add code
Oct 30, 2025
Viaarxiv icon

When Can You Get Away with Low Memory Adam?

Add code
Mar 03, 2025
Figure 1 for When Can You Get Away with Low Memory Adam?
Figure 2 for When Can You Get Away with Low Memory Adam?
Figure 3 for When Can You Get Away with Low Memory Adam?
Figure 4 for When Can You Get Away with Low Memory Adam?
Viaarxiv icon

Why Warmup the Learning Rate? Underlying Mechanisms and Improvements

Add code
Jun 13, 2024
Figure 1 for Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
Figure 2 for Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
Figure 3 for Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
Figure 4 for Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
Viaarxiv icon

Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos

Add code
Nov 03, 2023
Viaarxiv icon

Phase diagram of training dynamics in deep neural networks: effect of learning rate, depth, and width

Add code
Feb 23, 2023
Viaarxiv icon