Picture for Yuzhong Hong

Yuzhong Hong

GVPO: Group Variance Policy Optimization for Large Language Model Post-Training

Add code
Apr 28, 2025
Viaarxiv icon

Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model

Add code
Dec 18, 2024
Viaarxiv icon

Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models

Add code
Dec 17, 2024
Figure 1 for Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Figure 2 for Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Figure 3 for Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Figure 4 for Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Viaarxiv icon