Pedestrian trajectory prediction is an active research area with recent works undertaken to embed accurate models of pedestrians social interactions and their contextual compliance into dynamic spatial graphs. However, existing works rely on spatial assumptions about the scene and dynamics, which entails a significant challenge to adapt the graph structure in unknown environments for an online system. In addition, there is a lack of assessment approach for the relational modeling impact on prediction performance. To fill this gap, we propose Social Trajectory Recommender-Gated Graph Recurrent Neighborhood Network, (STR-GGRNN), which uses data-driven adaptive online neighborhood recommendation based on the contextual scene features and pedestrian visual cues. The neighborhood recommendation is achieved by online Nonnegative Matrix Factorization (NMF) to construct the graph adjacency matrices for predicting the pedestrians' trajectories. Experiments based on widely-used datasets show that our method outperforms the state-of-the-art. Our best performing model achieves 12 cm ADE and $\sim$15 cm FDE on ETH-UCY dataset. The proposed method takes only 0.49 seconds when sampling a total of 20K future trajectories per frame.
Video games have become an integral part of most people's lives in recent times. This led to an abundance of data related to video games being shared online. However, this comes with issues such as incorrect ratings, reviews or anything that is being shared. Recommendation systems are powerful tools that help users by providing them with meaningful recommendations. A straightforward approach would be to predict the scores of video games based on other information related to the game. It could be used as a means to validate user-submitted ratings as well as provide recommendations. This work provides a method to predict the G-Score, that defines how good a video game is, from its trailer (video) and summary (text). We first propose models to predict the G-Score based on the trailer alone (unimodal). Later on, we show that considering information from multiple modalities helps the models perform better compared to using information from videos alone. Since we couldn't find any suitable multimodal video game dataset, we created our own dataset named VGD (Video Game Dataset) and provide it along with this work. The approach mentioned here can be generalized to other multimodal datasets such as movie trailers and summaries etc. Towards the end, we talk about the shortcomings of the work and some methods to overcome them.
Neural Machine Translation (NMT) is a new approach to machine translation that has made great progress in recent years. However, recent studies show that NMT generally produces fluent but inadequate translations (Tu et al. 2016b; Tu et al. 2016a; He et al. 2016; Tu et al. 2017). This is in contrast to conventional Statistical Machine Translation (SMT), which usually yields adequate but non-fluent translations. It is natural, therefore, to leverage the advantages of both models for better translations, and in this work we propose to incorporate SMT model into NMT framework. More specifically, at each decoding step, SMT offers additional recommendations of generated words based on the decoding information from NMT (e.g., the generated partial translation and attention history). Then we employ an auxiliary classifier to score the SMT recommendations and a gating function to combine the SMT recommendations with NMT generations, both of which are jointly trained within the NMT architecture in an end-to-end manner. Experimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets.
In feeds recommendation, the first step is candidate generation. Most of the candidate generation models are based on CTR estimation, which do not consider user's satisfaction with the clicked item. Items with low quality but attractive title (i.e., click baits) may be recommended to the user, which worsens the user experience. One solution to this problem is to model the click and the reading duration simultaneously under the multi-task learning (MTL) framework. There are two challenges in the modeling. The first one is how to deal with the zero duration of the negative samples, which does not necessarily indicate dislikes. The second one is how to perform multi-task learning in the candidate generation model with double tower structure that can only model one single task. In this paper, we propose an distillation based multi-task learning (DMTL) approach to tackle these two challenges. We model duration by considering its dependency of click in the MTL, and then transfer the knowledge learned from the MTL teacher model to the student candidate generation model by distillation. Experiments conducted on dataset gathered from traffic logs of Tencent Kandian's recommender system show that the proposed approach outperforms the competitors significantly in modeling duration, which demonstrates the effectiveness of the proposed candidate generation model.
Recommendation systems get expanding significance because of their applications in both the scholarly community and industry. With the development of additional data sources and methods of extracting new information other than the rating history of clients on items, hybrid recommendation algorithms, in which some methods have usually been combined to improve performance, have become pervasive. In this work, we first introduce a novel method to extract the implicit relationship between content features using a sort of well-known methods from the natural language processing domain, namely Word2Vec. In contrast to the typical use of Word2Vec, we utilize some features of items as words of sentences to produce neural feature embeddings, through which we can calculate the similarity between features. Next, we propose a novel content-based recommendation system that employs the relationship to determine vector representations for items by which the similarity between items can be computed (RELFsim). Our evaluation results demonstrate that it can predict the preference a user would have for a set of items as good as pure collaborative filtering. This content-based algorithm is also embedded in a pure item-based collaborative filtering algorithm to deal with the cold-start problem and enhance its accuracy. Our experiments on a benchmark movie dataset corroborate that the proposed approach improves the accuracy of the system.
Privacy and ethics of citizens are at the core of the concerns raised by our increasingly digital society. Profiling users is standard practice for software applications triggering the need for users, also enforced by laws, to properly manage privacy settings. Users need to manage software privacy settings properly to protect personally identifiable information and express personal ethical preferences. AI technologies that empower users to interact with the digital world by reflecting their personal ethical preferences can be key enablers of a trustworthy digital society. We focus on the privacy dimension and contribute a step in the above direction through an empirical study on an existing dataset collected from the fitness domain. We find out which set of questions is appropriate to differentiate users according to their preferences. The results reveal that a compact set of semantic-driven questions (about domain-independent privacy preferences) helps distinguish users better than a complex domain-dependent one. This confirms the study's hypothesis that moral attitudes are the relevant piece of information to collect. Based on the outcome, we implement a recommender system to provide users with suitable recommendations related to privacy choices. We then show that the proposed recommender system provides relevant settings to users, obtaining high accuracy.
Region-based image retrieval (RBIR) technique is revisited. In early attempts at RBIR in the late 90s, researchers found many ways to specify region-based queries and spatial relationships; however, the way to characterize the regions, such as by using color histograms, were very poor at that time. Here, we revisit RBIR by incorporating semantic specification of objects and intuitive specification of spatial relationships. Our contributions are the following. First, to support multiple aspects of semantic object specification (category, instance, and attribute), we propose a multitask CNN feature that allows us to use deep learning technique and to jointly handle multi-aspect object specification. Second, to help users specify spatial relationships among objects in an intuitive way, we propose recommendation techniques of spatial relationships. In particular, by mining the search results, a system can recommend feasible spatial relationships among the objects. The system also can recommend likely spatial relationships by assigned object category names based on language prior. Moreover, object-level inverted indexing supports very fast shortlist generation, and re-ranking based on spatial constraints provides users with instant RBIR experiences.
The ubiquity of online fashion shopping demands effective recommendation services for customers. In this paper, we study two types of fashion recommendation: (i) suggesting an item that matches existing components in a set to form a stylish outfit (a collection of fashion items), and (ii) generating an outfit with multimodal (images/text) specifications from a user. To this end, we propose to jointly learn a visual-semantic embedding and the compatibility relationships among fashion items in an end-to-end fashion. More specifically, we consider a fashion outfit to be a sequence (usually from top to bottom and then accessories) and each item in the outfit as a time step. Given the fashion items in an outfit, we train a bidirectional LSTM (Bi-LSTM) model to sequentially predict the next item conditioned on previous ones to learn their compatibility relationships. Further, we learn a visual-semantic space by regressing image features to their semantic representations aiming to inject attribute and category information as a regularization for training the LSTM. The trained network can not only perform the aforementioned recommendations effectively but also predict the compatibility of a given outfit. We conduct extensive experiments on our newly collected Polyvore dataset, and the results provide strong qualitative and quantitative evidence that our framework outperforms alternative methods.
As the core of recommender system, collaborative filtering (CF) models the affinity between a user and an item from historical user-item interactions, such as clicks, purchases, and so on. Benefited from the strong representation power, neural networks have recently revolutionized the recommendation research, setting up a new standard for CF. However, existing neural recommender models do not explicitly consider the correlations among embedding dimensions, making them less effective in modeling the interaction function between users and items. In this work, we emphasize on modeling the correlations among embedding dimensions in neural networks to pursue higher effectiveness for CF. We propose a novel and general neural collaborative filtering framework, namely ConvNCF, which is featured with two designs: 1) applying outer product on user embedding and item embedding to explicitly model the pairwise correlations between embedding dimensions, and 2) employing convolutional neural network above the outer product to learn the high-order correlations among embedding dimensions. To justify our proposal, we present three instantiations of ConvNCF by using different inputs to represent a user and conduct experiments on two real-world datasets. Extensive results verify the utility of modeling embedding dimension correlations with ConvNCF, which outperforms several competitive CF methods.
Collaborative filtering (CF) is the key technique for recommender systems (RSs). CF exploits user-item behavior interactions (e.g., clicks) only and hence suffers from the data sparsity issue. One research thread is to integrate auxiliary information such as product reviews and news titles, leading to hybrid filtering methods. Another thread is to transfer knowledge from other source domains such as improving the movie recommendation with the knowledge from the book domain, leading to transfer learning methods. In real-world life, no single service can satisfy a user's all information needs. Thus it motivates us to exploit both auxiliary and source information for RSs in this paper. We propose a novel neural model to smoothly enable Transfer Meeting Hybrid (TMH) methods for cross-domain recommendation with unstructured text in an end-to-end manner. TMH attentively extracts useful content from unstructured text via a memory module and selectively transfers knowledge from a source domain via a transfer network. On two real-world datasets, TMH shows better performance in terms of three ranking metrics by comparing with various baselines. We conduct thorough analyses to understand how the text content and transferred knowledge help the proposed model.