Picture for Huanwei Di

Huanwei Di

PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning

Add code
May 29, 2026
Viaarxiv icon