Alert button

Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality

Feb 29, 2024
Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: