TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading

Add code
Mar 01, 2026

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: