Picture for Ruopei Sun

Ruopei Sun

Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks

Add code
May 19, 2025
Viaarxiv icon

Bias Fitting to Mitigate Length Bias of Reward Model in RLHF

Add code
May 19, 2025
Viaarxiv icon

Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling

Add code
Feb 02, 2025
Figure 1 for Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling
Figure 2 for Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling
Figure 3 for Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling
Figure 4 for Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling
Viaarxiv icon