Picture for Shaohuai Shi

Shaohuai Shi

Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism

Add code
Dec 25, 2025
Figure 1 for Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
Figure 2 for Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
Figure 3 for Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
Figure 4 for Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
Viaarxiv icon

PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

Add code
Nov 15, 2025
Viaarxiv icon

FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models

Add code
Jan 18, 2025
Figure 1 for FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
Figure 2 for FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
Figure 3 for FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
Figure 4 for FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
Viaarxiv icon

ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference

Add code
Oct 23, 2024
Figure 1 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Figure 2 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Figure 3 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Figure 4 for ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
Viaarxiv icon

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression

Add code
Oct 16, 2024
Figure 1 for FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Figure 2 for FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Figure 3 for FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Figure 4 for FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Viaarxiv icon

Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning

Add code
Aug 27, 2024
Figure 1 for Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
Figure 2 for Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
Figure 3 for Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
Figure 4 for Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
Viaarxiv icon

Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules

Add code
Jun 30, 2024
Viaarxiv icon

FedImpro: Measuring and Improving Client Update in Federated Learning

Add code
Feb 10, 2024
Figure 1 for FedImpro: Measuring and Improving Client Update in Federated Learning
Figure 2 for FedImpro: Measuring and Improving Client Update in Federated Learning
Figure 3 for FedImpro: Measuring and Improving Client Update in Federated Learning
Figure 4 for FedImpro: Measuring and Improving Client Update in Federated Learning
Viaarxiv icon

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Add code
Nov 07, 2023
Viaarxiv icon

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

Add code
Sep 03, 2023
Figure 1 for FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
Figure 2 for FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
Figure 3 for FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
Figure 4 for FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
Viaarxiv icon