Picture for Stephane Hatgis-Kessell

Stephane Hatgis-Kessell

Influencing Humans to Conform to Preference Models for RLHF

Add code
Jan 11, 2025
Figure 1 for Influencing Humans to Conform to Preference Models for RLHF
Figure 2 for Influencing Humans to Conform to Preference Models for RLHF
Figure 3 for Influencing Humans to Conform to Preference Models for RLHF
Figure 4 for Influencing Humans to Conform to Preference Models for RLHF
Viaarxiv icon

Learning Optimal Advantage from Preferences and Mistaking it for Reward

Add code
Oct 03, 2023
Figure 1 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Figure 2 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Figure 3 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Figure 4 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Viaarxiv icon

Models of human preference for learning reward functions

Add code
Jun 05, 2022
Figure 1 for Models of human preference for learning reward functions
Figure 2 for Models of human preference for learning reward functions
Figure 3 for Models of human preference for learning reward functions
Figure 4 for Models of human preference for learning reward functions
Viaarxiv icon