Alert button

Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization

Add code
Bookmark button
Alert button
Nov 08, 2019
Hongfei Xu, Qiuhui Liu, Josef van Genabith, Jingyi Zhang

Figure 1 for Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization
Figure 2 for Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization
Figure 3 for Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization
Figure 4 for Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: