ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Add code
Dec 19, 2024
Figure 1 for ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Figure 2 for ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Figure 3 for ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Figure 4 for ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: