Picture for Jialiang Cheng

Jialiang Cheng

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

Add code
May 14, 2026
Viaarxiv icon

SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models

Add code
Feb 07, 2026
Viaarxiv icon

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models

Add code
Dec 10, 2024
Figure 1 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Figure 2 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Figure 3 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Figure 4 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Viaarxiv icon