Picture for Haibin Lin

Haibin Lin

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Add code
May 19, 2025
Viaarxiv icon

Seed1.5-VL Technical Report

Add code
May 11, 2025
Viaarxiv icon

Understanding Stragglers in Large Model Training Using What-if Analysis

Add code
May 09, 2025
Viaarxiv icon

OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training

Add code
Apr 14, 2025
Viaarxiv icon

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Add code
Apr 08, 2025
Viaarxiv icon

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

Add code
Apr 03, 2025
Viaarxiv icon

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Add code
Mar 18, 2025
Viaarxiv icon

ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs

Add code
Feb 28, 2025
Viaarxiv icon

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

Add code
Feb 27, 2025
Viaarxiv icon

Minder: Faulty Machine Detection for Large-scale Distributed Model Training

Add code
Nov 04, 2024
Figure 1 for Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Figure 2 for Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Figure 3 for Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Figure 4 for Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Viaarxiv icon