Picture for Mou Sun

Mou Sun

Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs

Add code
Mar 03, 2026
Viaarxiv icon

Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement

Add code
Feb 26, 2026
Viaarxiv icon

Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization

Add code
Feb 15, 2026
Viaarxiv icon