Alert button

Small-scale proxies for large-scale Transformer training instabilities

Add code
Bookmark button
Alert button
Sep 25, 2023
Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

Figure 1 for Small-scale proxies for large-scale Transformer training instabilities
Figure 2 for Small-scale proxies for large-scale Transformer training instabilities
Figure 3 for Small-scale proxies for large-scale Transformer training instabilities
Figure 4 for Small-scale proxies for large-scale Transformer training instabilities

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: