Toward Simulating Environments in Reinforcement Learning Based Recommendations: Paper and Code

Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Toward Simulating Environments in Reinforcement Learning Based Recommendations

Jun 27, 2019
Xiangyu Zhao, Long Xia, Zhuoye Ding, Dawei Yin, Jiliang Tang

Figure 1 for Toward Simulating Environments in Reinforcement Learning Based Recommendations

Figure 2 for Toward Simulating Environments in Reinforcement Learning Based Recommendations

Figure 3 for Toward Simulating Environments in Reinforcement Learning Based Recommendations

Figure 4 for Toward Simulating Environments in Reinforcement Learning Based Recommendations

Share this with someone who'll enjoy it:

With the recent advances in Reinforcement Learning (RL), there have been tremendous interests in employing RL for recommender systems. RL-based recommender systems have two key advantages: (i) they can continuously update their recommendation strategies according to users' real-time feedback, and (ii) the optimal strategy maximizes the long-term reward from users, such as the total revenue of a recommendation session. However, directly training and evaluating a new RL-based recommendation algorithm needs to collect users' real-time feedback in the real system, which is time and efforts consuming and could negatively impact on users' experiences. Thus, it calls for a user simulator that can mimic real users' behaviors where we can pre-train and evaluate new recommendation algorithms. Simulating users' behaviors in a dynamic system faces immense challenges -- (i) the underlining item distribution is complex, and (ii) historical logs for each user are limited. In this paper, we develop a user simulator base on Generative Adversarial Network (GAN). To be specific, we design the generator to capture the underlining distribution of users' historical logs and generate realistic logs that can be considered as augmentations of real logs; while the discriminator is developed to not only distinguish real and fake logs but also predict users' behaviors. The experimental results based on real-world e-commerce data demonstrate the effectiveness of the proposed simulator. Further experiments have been conducted to understand the importance of each component in the simulator.

View paper on

Share this with someone who'll enjoy it: