Picture for Ruibin Zheng

Ruibin Zheng

Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning in LLMs

Add code
Aug 25, 2025
Viaarxiv icon