Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CAME: Confidence-guided Adaptive Memory Efficient Optimization

Jul 05, 2023

Yang Luo, Xiaozhe Ren, Zangwei Zheng, Zhuo Jiang, Xin Jiang, Yang You

Figure 1 for CAME: Confidence-guided Adaptive Memory Efficient Optimization

Figure 2 for CAME: Confidence-guided Adaptive Memory Efficient Optimization

Figure 3 for CAME: Confidence-guided Adaptive Memory Efficient Optimization

Figure 4 for CAME: Confidence-guided Adaptive Memory Efficient Optimization

Share this with someone who'll enjoy it:

Abstract:Adaptive gradient methods, such as Adam and LAMB, have demonstrated excellent performance in the training of large language models. Nevertheless, the need for adaptivity requires maintaining second-moment estimates of the per-parameter gradients, which entails a high cost of extra memory overheads. To solve this problem, several memory-efficient optimizers (e.g., Adafactor) have been proposed to obtain a drastic reduction in auxiliary memory usage, but with a performance penalty. In this paper, we first study a confidence-guided strategy to reduce the instability of existing memory efficient optimizers. Based on this strategy, we propose CAME to simultaneously achieve two goals: fast convergence as in traditional adaptive methods, and low memory usage as in memory-efficient methods. Extensive experiments demonstrate the training stability and superior performance of CAME across various NLP tasks such as BERT and GPT-2 training. Notably, for BERT pre-training on the large batch size of 32,768, our proposed optimizer attains faster convergence and higher accuracy compared with the Adam optimizer. The implementation of CAME is publicly available.

* Accepted by ACL 2023

View paper on

Share this with someone who'll enjoy it:

Title:CAME: Confidence-guided Adaptive Memory Efficient Optimization

Paper and Code