Picture for Qinzhen Guo

Qinzhen Guo

Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization

Add code
May 06, 2026
Viaarxiv icon