JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Add code
Oct 03, 2023
Figure 1 for JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Figure 2 for JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Figure 3 for JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Figure 4 for JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: