Picture for Dennis Liu

Dennis Liu

MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core

Add code
Apr 21, 2025
Viaarxiv icon

Llama 3 Meets MoE: Efficient Upcycling

Add code
Dec 13, 2024
Figure 1 for Llama 3 Meets MoE: Efficient Upcycling
Figure 2 for Llama 3 Meets MoE: Efficient Upcycling
Figure 3 for Llama 3 Meets MoE: Efficient Upcycling
Figure 4 for Llama 3 Meets MoE: Efficient Upcycling
Viaarxiv icon