Picture for Yuwen Hao

Yuwen Hao

Learning Less Is More: Premature Upper-Layer Attention Specialization Hurts Language Model Pretraining

Add code
May 11, 2026
Viaarxiv icon