Recommender systems have fulfilled an important role in everyday life. Recommendations such as news by Google, videos by Netflix, goods by e-commerce providers, etc. have heavily changed everyones lifestyle. Health domains contain similar decision-making problems such as what to eat, how to exercise, and what is the proper medicine for a patient. Recently, studies focused on recommender systems to solve health problems have attracted attention. In this paper, we review aspects of health recommender systems including interests, methods, evaluation, future challenges and trend issues. We find that 1) health recommender systems have their own health concern limitations that cause them to focus on less-risky recommendations such as diet recommendation; 2) traditional recommender methods such as content-based and collaborative filtering methods can hardly handle health constraints, but knowledge-based methods function more than ever; 3) evaluating a health recommendation is more complicated than evaluating a commercial one because multiple dimensions in addition to accuracy should be considered. Recommender systems can function well in the health domain after the solution of several key problems. Our work is a systematic review of health recommender system studies, we show current conditions and future directions. It is believed that this review will help domain researchers and promote health recommender systems to the next step.
We focus on the study of conversational recommendation in the context of multi-type dialogs, where the bots can proactively and naturally lead a conversation from a non-recommendation dialog (e.g., QA) to a recommendation dialog, taking into account user's interests and feedback. To facilitate the study of this task, we create a human-to-human Chinese dialog dataset DuRecDial (about 10k dialogs, 156k utterances), which contains multiple sequential dialogs for every pair of a recommendation seeker (user) and a recommender (bot). In each dialog, the recommender proactively leads a multi-type dialog to approach recommendation targets and then makes multiple recommendations with rich interaction behavior. This dataset allows us to systematically investigate different parts of the overall problem, e.g., how to naturally lead a dialog, how to interact with users for recommendation. Finally we establish baseline results on DuRecDial for future studies. Dataset and codes are publicly available at https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/ACL2020-DuRecDial.
We consider the problem of slate recommendation, where the recommender system presents a user with a collection or slate composed of K recommended items at once. If the user finds the recommended items appealing then the user may click and the recommender system receives some feedback. Two pieces of information are available to the recommender system: was the slate clicked? (the reward), and if the slate was clicked, which item was clicked? (rank). In this paper, we formulate several Bayesian models that incorporate the reward signal (Reward model), the rank signal (Rank model), or both (Full model), for non-personalized slate recommendation. In our experiments, we analyze performance gains of the Full model and show that it achieves significantly lower error as the number of products in the catalog grows or as the slate size increases.
The most common way to listen to recorded music nowadays is via streaming platforms which provide access to tens of millions of tracks. To assist users in effectively browsing these large catalogs, the integration of Music Recommender Systems (MRSs) has become essential. Current real-world MRSs are often quite complex and optimized for recommendation accuracy. They combine several building blocks based on collaborative filtering and content-based recommendation. This complexity can hinder the ability to explain recommendations to end users, which is particularly important for recommendations perceived as unexpected or inappropriate. While pure recommendation performance often correlates with user satisfaction, explainability has a positive impact on other factors such as trust and forgiveness, which are ultimately essential to maintain user loyalty. In this article, we discuss how explainability can be addressed in the context of MRSs. We provide perspectives on how explainability could improve music recommendation algorithms and enhance user experience. First, we review common dimensions and goals of recommenders' explainability and in general of eXplainable Artificial Intelligence (XAI), and elaborate on the extent to which these apply -- or need to be adapted -- to the specific characteristics of music consumption and recommendation. Then, we show how explainability components can be integrated within a MRS and in what form explanations can be provided. Since the evaluation of explanation quality is decoupled from pure accuracy-based evaluation criteria, we also discuss requirements and strategies for evaluating explanations of music recommendations. Finally, we describe the current challenges for introducing explainability within a large-scale industrial music recommender system and provide research perspectives.
An excessive number of customers often leads to a degradation in service quality. However, the capacity constraints of services are ignored by recommender systems, which may lead to unsatisfactory recommendation. This problem can be solved by limiting the number of users who receive the recommendation for a service, but this may be viewed as unfair. In this paper, we propose a novel metric Top-N Fairness to measure the individual fairness of multi-round recommendations of services with capacity constraints. By considering the fact that users are often only affected by top-ranked items in a recommendation, Top-N Fairness only considers a sub-list consisting of top N services. Based on the metric, we design FAST, a Fairness Assured service recommendation STrategy. FAST adjusts the original recommendation list to provide users with recommendation results that guarantee the long-term fairness of multi-round recommendations. We prove the convergence property of the variance of Top-N Fairness of FAST theoretically. FAST is tested on the Yelp dataset and synthetic datasets. The experimental results show that FAST achieves better recommendation fairness while still maintaining high recommendation quality.
Hashtag recommendation is a crucial task, especially with an increase of interest in using social media platforms such as Twitter in the last decade. Hashtag recommendation systems automatically suggest hashtags to a user while writing a tweet. Most of the research in the area of hashtag recommendation have used classical metrics such as hit rate, precision, recall, and F1-score to measure the accuracy of hashtag recommendation systems. These metrics are based on the exact match of the recommended hashtags with their corresponding ground truth. However, it is not clear how adequate these metrics to evaluate hashtag recommendation. The research question that we are interested in seeking an answer is: are these metrics adequate for evaluating hashtag recommendation systems when the numbers of ground truth hashtags in tweets are highly variable? In this paper, we propose a new metric which we call hit ratio for hashtag recommendation. Extensive evaluation through hypothetical examples and real-world application across a range of hashtag recommendation models indicate that the hit ratio is a useful metric. A comparison of hit ratio with the classical evaluation metrics reveals their limitations.
Current music recommender systems typically act in a greedy fashion by recommending songs with the highest user ratings. Greedy recommendation, however, is suboptimal over the long term: it does not actively gather information on user preferences and fails to recommend novel songs that are potentially interesting. A successful recommender system must balance the needs to explore user preferences and to exploit this information for recommendation. This paper presents a new approach to music recommendation by formulating this exploration-exploitation trade-off as a reinforcement learning task called the multi-armed bandit. To learn user preferences, it uses a Bayesian model, which accounts for both audio content and the novelty of recommendations. A piecewise-linear approximation to the model and a variational inference algorithm are employed to speed up Bayesian inference. One additional benefit of our approach is a single unified model for both music recommendation and playlist generation. Both simulation results and a user study indicate strong potential for the new approach.
As recommendation is essentially a comparative (or ranking) process, a good explanation should illustrate to users why an item is believed to be better than another, i.e., comparative explanations about the recommended items. Ideally, after reading the explanations, a user should reach the same ranking of items as the system's. Unfortunately, little research attention has yet been paid on such comparative explanations. In this work, we develop an extract-and-refine architecture to explain the relative comparisons among a set of ranked items from a recommender system. For each recommended item, we first extract one sentence from its associated reviews that best suits the desired comparison against a set of reference items. Then this extracted sentence is further articulated with respect to the target user through a generative model to better explain why the item is recommended. We design a new explanation quality metric based on BLEU to guide the end-to-end training of the extraction and refinement components, which avoids generation of generic content. Extensive offline evaluations on two large recommendation benchmark datasets and serious user studies against an array of state-of-the-art explainable recommendation algorithms demonstrate the necessity of comparative explanations and the effectiveness of our solution.