Picture for Bimei Wang

Bimei Wang

On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR

Add code
May 07, 2026
Viaarxiv icon