Picture for Shiping Gao

Shiping Gao

Discriminative Policy Optimization for Token-Level Reward Models

Add code
May 29, 2025
Viaarxiv icon

Advantage-Guided Distillation for Preference Alignment in Small Language Models

Add code
Feb 25, 2025
Viaarxiv icon

Self-Evolution Fine-Tuning for Policy Optimization

Add code
Jun 16, 2024
Figure 1 for Self-Evolution Fine-Tuning for Policy Optimization
Figure 2 for Self-Evolution Fine-Tuning for Policy Optimization
Figure 3 for Self-Evolution Fine-Tuning for Policy Optimization
Figure 4 for Self-Evolution Fine-Tuning for Policy Optimization
Viaarxiv icon