Picture for Jingchu Wang

Jingchu Wang

R$^2$PO: Decoupling Training Trajectories from Inference Responses for LLM Reasoning

Add code
Jan 17, 2026
Viaarxiv icon