Alert button

Training Faster with Compressed Gradient

Aug 13, 2020
An Xu, Zhouyuan Huo, Heng Huang

Figure 1 for Training Faster with Compressed Gradient
Figure 2 for Training Faster with Compressed Gradient
Figure 3 for Training Faster with Compressed Gradient

Share this with someone who'll enjoy it:

Although the distributed machine learning methods show the potential for the speed-up of training large deep neural networks, the communication cost has been the notorious bottleneck to constrain the performance. To address this challenge, the gradient compression based communication-efficient distributed learning methods were designed to reduce the communication cost, and more recently the local error feedback was incorporated to compensate for the performance loss. However, in this paper, we will show the "gradient mismatch" problem of the local error feedback in centralized distributed training and this issue can lead to degraded performance compared with full-precision training. To solve this critical problem, we propose two novel techniques: 1) step ahead; 2) error averaging. Both our theoretical and empirical results show that our new methods can alleviate the "gradient mismatch" problem. Experiments show that we can even train \textbf{faster with compressed gradient} than full-precision training \textbf{regarding training epochs}.

View paper onarxiv icon

Share this with someone who'll enjoy it: