Picture for Xingwei Gan

Xingwei Gan

Complementing reinforcement learning with SFT through logit averaging in the post training of LLMs

Add code
May 19, 2026
Viaarxiv icon