Picture for Haitao Hong

Haitao Hong

Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

Add code
Aug 07, 2025
Viaarxiv icon