Picture for Christoph Dann

Christoph Dann

Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Add code
Jul 22, 2024
Figure 1 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 2 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 3 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 4 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Viaarxiv icon

Rate-Preserving Reductions for Blackwell Approachability

Add code
Jun 10, 2024
Figure 1 for Rate-Preserving Reductions for Blackwell Approachability
Figure 2 for Rate-Preserving Reductions for Blackwell Approachability
Figure 3 for Rate-Preserving Reductions for Blackwell Approachability
Viaarxiv icon

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

Add code
Jan 08, 2024
Figure 1 for A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Figure 2 for A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Figure 3 for A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Figure 4 for A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Viaarxiv icon

Data-Driven Regret Balancing for Online Model Selection in Bandits

Add code
Jun 05, 2023
Figure 1 for Data-Driven Regret Balancing for Online Model Selection in Bandits
Figure 2 for Data-Driven Regret Balancing for Online Model Selection in Bandits
Figure 3 for Data-Driven Regret Balancing for Online Model Selection in Bandits
Figure 4 for Data-Driven Regret Balancing for Online Model Selection in Bandits
Viaarxiv icon

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Add code
Feb 20, 2023
Figure 1 for A Blackbox Approach to Best of Both Worlds in Bandits and Beyond
Viaarxiv icon

Best of Both Worlds Policy Optimization

Add code
Feb 18, 2023
Figure 1 for Best of Both Worlds Policy Optimization
Viaarxiv icon

Learning in POMDPs is Sample-Efficient with Hindsight Observability

Add code
Feb 03, 2023
Figure 1 for Learning in POMDPs is Sample-Efficient with Hindsight Observability
Figure 2 for Learning in POMDPs is Sample-Efficient with Hindsight Observability
Figure 3 for Learning in POMDPs is Sample-Efficient with Hindsight Observability
Viaarxiv icon

Pseudonorm Approachability and Applications to Regret Minimization

Add code
Feb 03, 2023
Viaarxiv icon

A Unified Algorithm for Stochastic Path Problems

Add code
Oct 17, 2022
Figure 1 for A Unified Algorithm for Stochastic Path Problems
Figure 2 for A Unified Algorithm for Stochastic Path Problems
Viaarxiv icon

A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning

Add code
Aug 23, 2022
Viaarxiv icon