Picture for Linbo Xi

Linbo Xi

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

Add code
Apr 15, 2026
Viaarxiv icon