Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization

Add code
Nov 08, 2019
Figure 1 for Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization
Figure 2 for Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization
Figure 3 for Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization
Figure 4 for Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: