No More Adam: Learning Rate Scaling at Initialization is All You Need

Add code
Dec 17, 2024
Figure 1 for No More Adam: Learning Rate Scaling at Initialization is All You Need
Figure 2 for No More Adam: Learning Rate Scaling at Initialization is All You Need
Figure 3 for No More Adam: Learning Rate Scaling at Initialization is All You Need
Figure 4 for No More Adam: Learning Rate Scaling at Initialization is All You Need

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: