Alert button

Making Asynchronous Stochastic Gradient Descent Work for Transformers

Jun 08, 2019
Alham Fikri Aji, Kenneth Heafield

Figure 1 for Making Asynchronous Stochastic Gradient Descent Work for Transformers
Figure 2 for Making Asynchronous Stochastic Gradient Descent Work for Transformers
Figure 3 for Making Asynchronous Stochastic Gradient Descent Work for Transformers
Figure 4 for Making Asynchronous Stochastic Gradient Descent Work for Transformers

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: