Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Nov 29, 2020

Jinlin Lai, Lixin Zou, Jiaxing Song

Figure 1 for Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Figure 2 for Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Figure 3 for Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Figure 4 for Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Share this with someone who'll enjoy it:

Abstract:Off-policy evaluation is a key component of reinforcement learning which evaluates a target policy with offline data collected from behavior policies. It is a crucial step towards safe reinforcement learning and has been used in advertisement, recommender systems and many other applications. In these applications, sometimes the offline data is collected from multiple behavior policies. Previous works regard data from different behavior policies equally. Nevertheless, some behavior policies are better at producing good estimators while others are not. This paper starts with discussing how to correctly mix estimators produced by different behavior policies. We propose three ways to reduce the variance of the mixture estimator when all sub-estimators are unbiased or asymptotically unbiased. Furthermore, experiments on simulated recommender systems show that our methods are effective in reducing the Mean-Square Error of estimation.

* Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2020

View paper on

Share this with someone who'll enjoy it:

Title:Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Paper and Code