Picture for Huan Zhu

Huan Zhu

IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning

Add code
Jan 02, 2026
Viaarxiv icon