Alert button

RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models

Feb 15, 2024
Saeed Khaki, JinJin Li, Lan Ma, Liu Yang, Prathap Ramachandra

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: