RLHF


RM-R1: Reward Modeling as Reasoning

Add code
May 05, 2025
Viaarxiv icon

Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models

Add code
May 05, 2025
Viaarxiv icon

Contextual Online Uncertainty-Aware Preference Learning for Human Feedback

Add code
Apr 29, 2025
Viaarxiv icon

Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors

Add code
Apr 27, 2025
Viaarxiv icon

(Im)possibility of Automated Hallucination Detection in Large Language Models

Add code
Apr 23, 2025
Viaarxiv icon

Learning Explainable Dense Reward Shapes via Bayesian Optimization

Add code
Apr 22, 2025
Viaarxiv icon

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

Add code
Apr 22, 2025
Viaarxiv icon

DRAGON: Distributional Rewards Optimize Diffusion Generative Models

Add code
Apr 21, 2025
Viaarxiv icon

Establishing Reliability Metrics for Reward Models in Large Language Models

Add code
Apr 21, 2025
Viaarxiv icon

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Add code
Apr 20, 2025
Viaarxiv icon