Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Aug 28, 2020

Hanlin Tang, Shaoduo Gan, Samyam Rajbhandari, Xiangru Lian, Ji Liu, Yuxiong He, Ce Zhang

Figure 1 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Figure 2 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Figure 3 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Figure 4 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Share this with someone who'll enjoy it:

Abstract:Adam is the important optimization algorithm to guarantee efficiency and accuracy for training many important tasks such as BERT and ImageNet. However, Adam is generally not compatible with information (gradient) compression technology. Therefore, the communication usually becomes the bottleneck for parallelizing Adam. In this paper, we propose a communication efficient {\bf A}DAM {\bf p}reconditioned {\bf M}omentum SGD algorithm-- named APMSqueeze-- through an error compensated method compressing gradients. The proposed algorithm achieves a similar convergence efficiency to Adam in term of epochs, but significantly reduces the running time per epoch. In terms of end-to-end performance (including the full-precision pre-condition step), APMSqueeze is able to provide {sometimes by up to $2-10\times$ speed-up depending on network bandwidth.} We also conduct theoretical analysis on the convergence and efficiency.

View paper on

Share this with someone who'll enjoy it:

Title:APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Paper and Code