Picture for Haonan Song

Haonan Song

Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following

Add code
Jan 08, 2026
Viaarxiv icon

TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL

Add code
Jan 07, 2026
Viaarxiv icon

IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning

Add code
Jan 02, 2026
Viaarxiv icon