Recommendation algorithms are susceptible to popularity bias: a tendency to recommend popular items even when they fail to meet user needs. A related issue is that the recommendation quality can vary by demographic groups. Marginalized groups or groups that are under-represented in the training data may receive less relevant recommendations from these algorithms compared to others. In a recent study, Ekstrand et al. investigate how recommender performance varies according to popularity and demographics, and find statistically significant differences in recommendation utility between binary genders on two datasets, and significant effects based on age on one dataset. Here we reproduce those results and extend them with additional analyses. We find statistically significant differences in recommender performance by both age and gender. We observe that recommendation utility steadily degrades for older users, and is lower for women than men. We also find that the utility is higher for users from countries with more representation in the dataset. In addition, we find that total usage and the popularity of consumed content are strong predictors of recommender performance and also vary significantly across demographic groups.
Recent years have witnessed progress in precisely predicting the user's next behavior in session-based recommendation scenarios. However, recommending long tail items which are proved important to recommender systems is rarely investigated in existing work. To handle this problem, we incorporate the calibration for long tail session-based recommendation, which aims to align the proportion of tail items of recommendation list with the proportion in the session. To do this, we design a calibration framework to make the model aware of the popularity distribution of it recommendation list, and calibrate the result according to the ongoing session. Meanwhile a separate training and prediction strategy is applied to deal with the imbalance problem caused by popularity bias. Experiments on benchmark datasets shows that our model can both achieve the competitive accuracy of recommendation and provide much more tail items.
Recommender systems take inputs from user history, use an internal ranking algorithm to generate results and possibly optimize this ranking based on feedback. However, often the recommender system is unaware of the actual intent of the user and simply provides recommendations dynamically without properly understanding the thought process of the user. An intelligent recommender system is not only useful for the user but also for businesses which want to learn the tendencies of their users. Finding out tendencies or intents of a user is a difficult problem to solve. Keeping this in mind, we sought out to create an intelligent system which will keep track of the user's activity on a web-application as well as determine the intent of the user in each session. We devised a way to encode the user's activity through the sessions. Then, we have represented the information seen by the user in a high dimensional format which is reduced to lower dimensions using tensor factorization techniques. The aspect of intent awareness (or scoring) is dealt with at this stage. Finally, combining the user activity data with the contextual information gives the recommendation score. The final recommendations are then ranked using filtering and collaborative recommendation techniques to show the top-k recommendations to the user. A provision for feedback is also envisioned in the current system which informs the model to update the various weights in the recommender system. Our overall model aims to combine both frequency-based and context-based recommendation systems and quantify the intent of a user to provide better recommendations. We ran experiments on real-world timestamped user activity data, in the setting of recommending reports to the users of a business analytics tool and the results are better than the baselines. We also tuned certain aspects of our model to arrive at optimized results.
Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction into automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods, and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles.
Deep learning-based recommendation models are used pervasively and broadly, for example, to recommend movies, products, or other information most relevant to users, in order to enhance the user experience. Among various application domains which have received significant industry and academia research attention, such as image classification, object detection, language and speech translation, the performance of deep learning-based recommendation models is less well explored, even though recommendation tasks unarguably represent significant AI inference cycles at large-scale datacenter fleets. To advance the state of understanding and enable machine learning system development and optimization for the commerce domain, we aim to define an industry-relevant recommendation benchmark for the MLPerf Training andInference Suites. The paper synthesizes the desirable modeling strategies for personalized recommendation systems. We lay out desirable characteristics of recommendation model architectures and data sets. We then summarize the discussions and advice from the MLPerf Recommendation Advisory Board.
In academic research, recommender systems are often evaluated on benchmark datasets, without much consideration about the global timeline. Hence, we are unable to answer questions like: Do loyal users enjoy better recommendations than non-loyal users? Loyalty can be defined by the time period a user has been active in a recommender system, or by the number of historical interactions a user has. In this paper, we offer a comprehensive analysis of recommendation results along global timeline. We conduct experiments with five widely used models, i.e., BPR, NeuMF, LightGCN, SASRec and TiSASRec, on four benchmark datasets, i.e., MovieLens-25M, Yelp, Amazon-music, and Amazon-electronic. Our experiment results give an answer "No" to the above question. Users with many historical interactions suffer from relatively poorer recommendations. Users who stay with the system for a short time period enjoy better recommendations. Both findings are counter-intuitive. Interestingly, users who have recently interacted with the system, with respect to the time point of the test instance, enjoy better recommendations. The finding on recency applies to all users, regardless of users' loyalty. Our study offers a different perspective to understand recommender performance, and our findings could trigger a revisit of recommender model design.
Recommender systems provide essential web services by learning users' personal preferences from collected data. However, in many cases, systems also need to forget some training data. From the perspective of privacy, several privacy regulations have recently been proposed, requiring systems to eliminate any impact of the data whose owner requests to forget. From the perspective of utility, if a system's utility is damaged by some bad data, the system needs to forget these data to regain utility. From the perspective of usability, users can delete noise and incorrect entries so that a system can provide more useful recommendations. While unlearning is very important, it has not been well-considered in existing recommender systems. Although there are some researches have studied the problem of machine unlearning in the domains of image and text data, existing methods can not been directly applied to recommendation as they are unable to consider the collaborative information. In this paper, we propose RecEraser, a general and efficient machine unlearning framework tailored to recommendation task. The main idea of RecEraser is to partition the training set into multiple shards and train a constituent model for each shard. Specifically, to keep the collaborative information of the data, we first design three novel data partition algorithms to divide training data into balanced groups based on their similarity. Then, considering that different shard models do not uniformly contribute to the final prediction, we further propose an adaptive aggregation method to improve the global model utility. Experimental results on three public benchmarks show that RecEraser can not only achieve efficient unlearning, but also outperform the state-of-the-art unlearning methods in terms of model utility. The source code can be found at https://github.com/chenchongthu/Recommendation-Unlearning
Explaining to users why some items are recommended is critical, as it helps users to make better decisions, increase their satisfaction, and gain their trust in recommender systems (RS). However, existing explainable RS usually consider explanations as side outputs of the recommendation model, which has two problems: (1) it is difficult to evaluate the produced explanations because they are usually model-dependent, and (2) as a result, the possible impacts of those explanations are less investigated. To address the evaluation problem, we propose learning to explain for explainable recommendation. The basic idea is to train a model that selects explanations from a collection as a ranking-oriented task. A great challenge, however, is that the sparsity issue in the user-item-explanation data would be severer than that in traditional user-item relation data, since not every user-item pair can associate with multiple explanations. To mitigate this issue, we propose to perform two sets of matrix factorization by considering the ternary relationship as two groups of binary relationships. To further investigate the impacts of explanations, we extend the traditional item ranking of recommendation to an item-explanation joint-ranking formalization. We study if purposely selecting explanations could achieve certain learning goals, e.g., in this paper, improving the recommendation performance. Experiments on three large datasets verify our solution's effectiveness on both item recommendation and explanation ranking. In addition, our user-item-explanation datasets open up new ways of modeling and evaluating recommendation explanations. To facilitate the development of explainable RS, we will make our datasets and code publicly available.