Picture for Jialiang Cheng

Jialiang Cheng

SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models

Add code
Feb 07, 2026
Viaarxiv icon

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models

Add code
Dec 10, 2024
Figure 1 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Figure 2 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Figure 3 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Figure 4 for EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Viaarxiv icon