Picture for Liu Kang

Liu Kang

Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning

Add code
Jan 12, 2026
Viaarxiv icon

IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning

Add code
Jan 02, 2026
Viaarxiv icon