Picture for Wu Fei

Wu Fei

Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning

Add code
Jul 03, 2025
Viaarxiv icon