Alert button

DeepNet: Scaling Transformers to 1,000 Layers

Add code
Bookmark button
Alert button
Mar 01, 2022
Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei

Figure 1 for DeepNet: Scaling Transformers to 1,000 Layers
Figure 2 for DeepNet: Scaling Transformers to 1,000 Layers
Figure 3 for DeepNet: Scaling Transformers to 1,000 Layers
Figure 4 for DeepNet: Scaling Transformers to 1,000 Layers

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: