Picture for Yiyuan Ma

Yiyuan Ma

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Add code
May 19, 2025
Viaarxiv icon

Model Merging in Pre-training of Large Language Models

Add code
May 17, 2025
Viaarxiv icon

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Add code
Feb 28, 2025
Viaarxiv icon