Alert button

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Add code
Bookmark button
Alert button
Mar 11, 2023
Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind

Figure 1 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 2 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 3 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Figure 4 for Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: