Picture for Kenshi Abe

Kenshi Abe

Filtered Direct Preference Optimization

Add code
Apr 23, 2024
Viaarxiv icon

Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment

Add code
Apr 05, 2024
Figure 1 for Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment
Figure 2 for Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment
Figure 3 for Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment
Figure 4 for Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment
Viaarxiv icon

Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems

Add code
Feb 22, 2024
Figure 1 for Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems
Figure 2 for Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems
Figure 3 for Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems
Figure 4 for Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems
Viaarxiv icon

Return-Aligned Decision Transformer

Add code
Feb 06, 2024
Figure 1 for Return-Aligned Decision Transformer
Figure 2 for Return-Aligned Decision Transformer
Figure 3 for Return-Aligned Decision Transformer
Figure 4 for Return-Aligned Decision Transformer
Viaarxiv icon

Learning Fair Division from Bandit Feedback

Add code
Nov 15, 2023
Viaarxiv icon

Model-Based Minimum Bayes Risk Decoding

Add code
Nov 09, 2023
Viaarxiv icon

Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative

Add code
Jul 13, 2023
Figure 1 for Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative
Figure 2 for Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative
Figure 3 for Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative
Figure 4 for Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative
Viaarxiv icon

A Slingshot Approach to Learning in Monotone Games

Add code
May 26, 2023
Figure 1 for A Slingshot Approach to Learning in Monotone Games
Figure 2 for A Slingshot Approach to Learning in Monotone Games
Figure 3 for A Slingshot Approach to Learning in Monotone Games
Figure 4 for A Slingshot Approach to Learning in Monotone Games
Viaarxiv icon

Exploration of Unranked Items in Safe Online Learning to Re-Rank

Add code
May 02, 2023
Figure 1 for Exploration of Unranked Items in Safe Online Learning to Re-Rank
Figure 2 for Exploration of Unranked Items in Safe Online Learning to Re-Rank
Figure 3 for Exploration of Unranked Items in Safe Online Learning to Re-Rank
Viaarxiv icon

Fair Matrix Factorisation for Large-Scale Recommender Systems

Add code
Sep 09, 2022
Figure 1 for Fair Matrix Factorisation for Large-Scale Recommender Systems
Figure 2 for Fair Matrix Factorisation for Large-Scale Recommender Systems
Figure 3 for Fair Matrix Factorisation for Large-Scale Recommender Systems
Figure 4 for Fair Matrix Factorisation for Large-Scale Recommender Systems
Viaarxiv icon