Understanding Quantization of Optimizer States in LLM Pre-training: Dynamics of State Staleness and Effectiveness of State Resets

Add code
Mar 17, 2026

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: