Picture for Ruiyi Ding

Ruiyi Ding

PRPO: Aligning Process Reward with Outcome Reward in Policy Optimization

Add code
Jan 13, 2026
Viaarxiv icon