In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Add code
Feb 22, 2024
Figure 1 for In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: