Picture for Yuzhong Hong

Yuzhong Hong

GVPO: Group Variance Policy Optimization for Large Language Model Post-Training

Add code
Apr 28, 2025
Figure 1 for GVPO: Group Variance Policy Optimization for Large Language Model Post-Training
Figure 2 for GVPO: Group Variance Policy Optimization for Large Language Model Post-Training
Figure 3 for GVPO: Group Variance Policy Optimization for Large Language Model Post-Training
Viaarxiv icon

Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model

Add code
Dec 18, 2024
Figure 1 for Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Figure 2 for Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Figure 3 for Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Figure 4 for Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Viaarxiv icon

Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models

Add code
Dec 17, 2024
Figure 1 for Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Figure 2 for Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Figure 3 for Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Figure 4 for Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Viaarxiv icon