The ACM Recommender Systems Challenge 2018 focused on the task of automatic music playlist continuation, which is a form of the more general task of sequential recommendation. Given a playlist of arbitrary length with some additional meta-data, the task was to recommend up to 500 tracks that fit the target characteristics of the original playlist. For the RecSys Challenge, Spotify released a dataset of one million user-generated playlists. Participants could compete in two tracks, i.e., main and creative tracks. Participants in the main track were only allowed to use the provided training set, however, in the creative track, the use of external public sources was permitted. In total, 113 teams submitted 1,228 runs to the main track; 33 teams submitted 239 runs to the creative track. The highest performing team in the main track achieved an R-precision of 0.2241, an NDCG of 0.3946, and an average number of recommended songs clicks of 1.784. In the creative track, an R-precision of 0.2233, an NDCG of 0.3939, and a click rate of 1.785 was obtained by the best team. This article provides an overview of the challenge, including motivation, task definition, dataset description, and evaluation. We further report and analyze the results obtained by the top performing teams in each track and explore the approaches taken by the winners. We finally summarize our key findings and list the open avenues and possible future directions in the area of automatic playlist continuation.
Collaborative filtering, a widely-used recommendation technique, predicts a user's preference by aggregating the ratings from similar users. As a result, these measures cannot fully utilize the rating information and are not suitable for real world sparse data. To solve these issues, we propose a novel user distance measure named Preference Mover's Distance (PMD) which makes full use of all ratings made by each user. Our proposed PMD can properly measure the distance between a pair of users even if they have no co-rated items. We show that this measure can be cast as an instance of the Earth Mover's Distance, a well-studied transportation problem for which several highly efficient solvers have been developed. Experimental results show that PMD can help achieve superior recommendation accuracy than state-of-the-art methods, especially when training data is very sparse.
Product discovery is a crucial component for online shopping. However, item-to-item recommendations today do not allow users to explore changes along selected dimensions: given a query item, can a model suggest something similar but in a different color? We consider item recommendations of the comparative nature (e.g. "something darker") and show how CLIP-based models can support this use case in a zero-shot manner. Leveraging a large model built for fashion, we introduce GradREC and its industry potential, and offer a first rounded assessment of its strength and weaknesses.
Prior research on exposure fairness in the context of recommender systems has focused mostly on disparities in the exposure of individual or groups of items to individual users of the system. The problem of how individual or groups of items may be systemically under or over exposed to groups of users, or even all users, has received relatively less attention. However, such systemic disparities in information exposure can result in observable social harms, such as withholding economic opportunities from historically marginalized groups (allocative harm) or amplifying gendered and racialized stereotypes (representational harm). Previously, Diaz et al. developed the expected exposure metric -- that incorporates existing user browsing models that have previously been developed for information retrieval -- to study fairness of content exposure to individual users. We extend their proposed framework to formalize a family of exposure fairness metrics that model the problem jointly from the perspective of both the consumers and producers. Specifically, we consider group attributes for both types of stakeholders to identify and mitigate fairness concerns that go beyond individual users and items towards more systemic biases in recommendation. Furthermore, we study and discuss the relationships between the different exposure fairness dimensions proposed in this paper, as well as demonstrate how stochastic ranking policies can be optimized towards said fairness goals.
Candidate generation is the first stage in recommendation systems, where a light-weight system is used to retrieve potentially relevant items for an input user. These candidate items are then ranked and pruned in later stages of recommender systems using a more complex ranking model. Since candidate generation is the top of the recommendation funnel, it is important to retrieve a high-recall candidate set to feed into downstream ranking models. A common approach for candidate generation is to leverage approximate nearest neighbor (ANN) search from a single dense query embedding; however, this approach this can yield a low-diversity result set with many near duplicates. As users often have multiple interests, candidate retrieval should ideally return a diverse set of candidates reflective of the user's multiple interests. To this end, we introduce kNN-Embed, a general approach to improving diversity in dense ANN-based retrieval. kNN-Embed represents each user as a smoothed mixture over learned item clusters that represent distinct `interests' of the user. By querying each of a user's mixture component in proportion to their mixture weights, we retrieve a high-diversity set of candidates reflecting elements from each of a user's interests. We experimentally compare kNN-Embed to standard ANN candidate retrieval, and show significant improvements in overall recall and improved diversity across three datasets. Accompanying this work, we open source a large Twitter follow-graph dataset, to spur further research in graph-mining and representation learning for recommender systems.
Graph Convolution Network (GCN) has attracted significant attention and become the most popular method for learning graph representations. In recent years, many efforts have been focused on integrating GCN into the recommender tasks and have made remarkable progress. At its core is to explicitly capture high-order connectivities between the nodes in user-item bipartite graph. However, we theoretically and empirically find an inherent drawback existed in these GCN-based recommendation methods, where GCN is directly applied to aggregate neighboring nodes will introduce noise and information redundancy. Consequently, the these models' capability of capturing high-order connectivities among different nodes is limited, leading to suboptimal performance of the recommender tasks. The main reason is that the the nonlinear network layer inside GCN structure is not suitable for extracting non-sematic features(such as one-hot ID feature) in the collaborative filtering scenarios. In this work, we develop a new GCN-based Collaborative Filtering model, named Refined Graph convolution Collaborative Filtering(RGCF), where the construction of the embeddings of users (items) are delicately redesigned from several aspects during the aggregation on the graph. Compared to the state-of-the-art GCN-based recommendation, RGCF is more capable for capturing the implicit high-order connectivities inside the graph and the resultant vector representations are more expressive. We conduct extensive experiments on three public million-size datasets, demonstrating that our RGCF significantly outperforms state-of-the-art models. We release our code at https://github.com/hfutmars/RGCF.
Automatically recommending relevant law articles to a given legal case has attracted much attention as it can greatly release human labor from searching over the large database of laws. However, current researches only support coarse-grained recommendation where all relevant articles are predicted as a whole without explaining which specific fact each article is relevant with. Since one case can be formed of many supporting facts, traversing over them to verify the correctness of recommendation results can be time-consuming. We believe that learning fine-grained correspondence between each single fact and law articles is crucial for an accurate and trustworthy AI system. With this motivation, we perform a pioneering study and create a corpus with manually annotated fact-article correspondences. We treat the learning as a text matching task and propose a multi-level matching network to address it. To help the model better digest the content of law articles, we parse articles in form of premise-conclusion pairs with random forest. Experiments show that the parsed form yielded better performance and the resulting model surpassed other popular text matching baselines. Furthermore, we compare with previous researches and find that establishing the fine-grained fact-article correspondences can improve the recommendation accuracy by a large margin. Our best system reaches an F1 score of 96.3%, making it of great potential for practical use. It can also significantly boost the downstream task of legal decision prediction, increasing the F1 score by up to 12.7%.
Wikidata is an open knowledge graph built by a global community of volunteers. As it advances in scale, it faces substantial challenges around editor engagement. These challenges are in terms of both attracting new editors to keep up with the sheer amount of work and retaining existing editors. Experience from other online communities and peer-production systems, including Wikipedia, suggests that personalised recommendations could help, especially newcomers, who are sometimes unsure about how to contribute best to an ongoing effort. For this reason, we propose a recommender system WikidataRec for Wikidata items. The system uses a hybrid of content-based and collaborative filtering techniques to rank items for editors relying on both item features and item-editor previous interaction. A neural network, named a neural mixture of representations, is designed to learn fine weights for the combination of item-based representations and optimize them with editor-based representation by item-editor interaction. To facilitate further research in this space, we also create two benchmark datasets, a general-purpose one with 220,000 editors responsible for 14 million interactions with 4 million items and a second one focusing on the contributions of more than 8,000 more active editors. We perform an offline evaluation of the system on both datasets with promising results. Our code and datasets are available at https://github.com/WikidataRec-developer/Wikidata_Recommender.