Picture for Dingqian Hong

Dingqian Hong

GVPO: Group Variance Policy Optimization for Large Language Model Post-Training

Add code
Apr 28, 2025
Viaarxiv icon