Picture for Wenfeng Wang

Wenfeng Wang

MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts

Add code
Nov 18, 2025
Viaarxiv icon

A Survey on Inference Optimization Techniques for Mixture of Experts Models

Add code
Dec 18, 2024
Figure 1 for A Survey on Inference Optimization Techniques for Mixture of Experts Models
Figure 2 for A Survey on Inference Optimization Techniques for Mixture of Experts Models
Figure 3 for A Survey on Inference Optimization Techniques for Mixture of Experts Models
Figure 4 for A Survey on Inference Optimization Techniques for Mixture of Experts Models
Viaarxiv icon