Picture for Christoph Dann

Christoph Dann

Mitigating Preference Hacking in Policy Optimization with Pessimism

Add code
Mar 10, 2025
Viaarxiv icon

Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective

Add code
Feb 26, 2025
Viaarxiv icon

Design Considerations in Offline Preference-based RL

Add code
Feb 08, 2025
Viaarxiv icon

Preserving Expert-Level Privacy in Offline Reinforcement Learning

Add code
Nov 18, 2024
Viaarxiv icon

Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Add code
Jul 22, 2024
Figure 1 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 2 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 3 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 4 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Viaarxiv icon

Rate-Preserving Reductions for Blackwell Approachability

Add code
Jun 10, 2024
Figure 1 for Rate-Preserving Reductions for Blackwell Approachability
Figure 2 for Rate-Preserving Reductions for Blackwell Approachability
Figure 3 for Rate-Preserving Reductions for Blackwell Approachability
Viaarxiv icon

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

Add code
Jan 08, 2024
Viaarxiv icon

Data-Driven Regret Balancing for Online Model Selection in Bandits

Add code
Jun 05, 2023
Figure 1 for Data-Driven Regret Balancing for Online Model Selection in Bandits
Figure 2 for Data-Driven Regret Balancing for Online Model Selection in Bandits
Figure 3 for Data-Driven Regret Balancing for Online Model Selection in Bandits
Figure 4 for Data-Driven Regret Balancing for Online Model Selection in Bandits
Viaarxiv icon

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Add code
Feb 20, 2023
Figure 1 for A Blackbox Approach to Best of Both Worlds in Bandits and Beyond
Viaarxiv icon

Best of Both Worlds Policy Optimization

Add code
Feb 18, 2023
Figure 1 for Best of Both Worlds Policy Optimization
Viaarxiv icon