Picture for W. Bradley Knox

W. Bradley Knox

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

Add code
Mar 08, 2025
Viaarxiv icon

Influencing Humans to Conform to Preference Models for RLHF

Add code
Jan 11, 2025
Figure 1 for Influencing Humans to Conform to Preference Models for RLHF
Figure 2 for Influencing Humans to Conform to Preference Models for RLHF
Figure 3 for Influencing Humans to Conform to Preference Models for RLHF
Figure 4 for Influencing Humans to Conform to Preference Models for RLHF
Viaarxiv icon

MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control

Add code
Oct 23, 2024
Figure 1 for MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
Figure 2 for MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
Figure 3 for MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
Figure 4 for MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
Viaarxiv icon

Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

Add code
Oct 17, 2024
Figure 1 for Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Figure 2 for Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Figure 3 for Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Figure 4 for Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Viaarxiv icon

Contrastive Preference Learning: Learning from Human Feedback without RL

Add code
Oct 24, 2023
Viaarxiv icon

Learning Optimal Advantage from Preferences and Mistaking it for Reward

Add code
Oct 03, 2023
Figure 1 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Figure 2 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Figure 3 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Figure 4 for Learning Optimal Advantage from Preferences and Mistaking it for Reward
Viaarxiv icon

Models of human preference for learning reward functions

Add code
Jun 05, 2022
Figure 1 for Models of human preference for learning reward functions
Figure 2 for Models of human preference for learning reward functions
Figure 3 for Models of human preference for learning reward functions
Figure 4 for Models of human preference for learning reward functions
Viaarxiv icon

Reward (Mis)design for Autonomous Driving

Add code
Apr 28, 2021
Figure 1 for Reward (Mis)design for Autonomous Driving
Figure 2 for Reward (Mis)design for Autonomous Driving
Viaarxiv icon

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

Add code
Sep 28, 2020
Figure 1 for The EMPATHIC Framework for Task Learning from Implicit Human Feedback
Figure 2 for The EMPATHIC Framework for Task Learning from Implicit Human Feedback
Figure 3 for The EMPATHIC Framework for Task Learning from Implicit Human Feedback
Figure 4 for The EMPATHIC Framework for Task Learning from Implicit Human Feedback
Viaarxiv icon