Picture for Jinchang Zhu

Jinchang Zhu

Where Does Long-Context Supervision Actually Go? Effective-Context Exposure Balancing

Add code
May 11, 2026
Viaarxiv icon

Learning Less Is More: Premature Upper-Layer Attention Specialization Hurts Language Model Pretraining

Add code
May 11, 2026
Viaarxiv icon