Different from the traditional recommender system, the session-based recommender system introduces the concept of the session, i.e., a sequence of interactions between a user and multiple items within a period, to preserve the user's recent interest. The existing work on the session-based recommender system mainly relies on mining sequential patterns within individual sessions, which are not expressive enough to capture more complicated dependency relationships among items. In addition, it does not consider the cross-session information due to the anonymity of the session data, where the linkage between different sessions is prevented. In this paper, we solve these problems with the graph neural networks technique. First, each session is represented as a graph rather than a linear sequence structure, based on which a novel Full Graph Neural Network (FGNN) is proposed to learn complicated item dependency. To exploit and incorporate cross-session information in the individual session's representation learning, we further construct a Broadly Connected Session (BCS) graph to link different sessions and a novel Mask-Readout function to improve session embedding based on the BCS graph. Extensive experiments have been conducted on two e-commerce benchmark datasets, i.e., Yoochoose and Diginetica, and the experimental results demonstrate the superiority of our proposal through comparisons with state-of-the-art session-based recommender models.
For present e-commerce platforms, session-based recommender systems are developed to predict users' preference for next-item recommendation. Although a session can usually reflect a user's current preference, a local shift of the user's intention within the session may still exist. Specifically, the interactions that take place in the early positions within a session generally indicate the user's initial intention, while later interactions are more likely to represent the latest intention. Such positional information has been rarely considered in existing methods, which restricts their ability to capture the significance of interactions at different positions. To thoroughly exploit the positional information within a session, a theoretical framework is developed in this paper to provide an in-depth analysis of the positional information. We formally define the properties of forward-awareness and backward-awareness to evaluate the ability of positional encoding schemes in capturing the initial and the latest intention. According to our analysis, existing positional encoding schemes are generally forward-aware only, which can hardly represent the dynamics of the intention in a session. To enhance the positional encoding scheme for the session-based recommendation, a dual positional encoding (DPE) is proposed to account for both forward-awareness and backward-awareness. Based on DPE, we propose a novel Positional Recommender (PosRec) model with a well-designed Position-aware Gated Graph Neural Network module to fully exploit the positional information for session-based recommendation tasks. Extensive experiments are conducted on two e-commerce benchmark datasets, Yoochoose and Diginetica and the experimental results show the superiority of the PosRec by comparing it with the state-of-the-art session-based recommender models.
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training. It is natural to derive generative models and hallucinate training samples for unseen classes based on the knowledge learned from the seen samples. However, most of these models suffer from the `generation shifts', where the synthesized samples may drift from the real distribution of unseen data. In this paper, we conduct an in-depth analysis on this issue and propose a novel Generation Shifts Mitigating Flow (GSMFlow) framework, which is comprised of multiple conditional affine coupling layers for learning unseen data synthesis efficiently and effectively. In particular, we identify three potential problems that trigger the generation shifts, i.e., semantic inconsistency, variance decay, and structural permutation and address them respectively. First, to reinforce the correlations between the generated samples and the respective attributes, we explicitly embed the semantic information into the transformations in each of the coupling layers. Second, to recover the intrinsic variance of the synthesized unseen features, we introduce a visual perturbation strategy to diversify the intra-class variance of generated data and hereby help adjust the decision boundary of the classifier. Third, to avoid structural permutation in the semantic space, we propose a relative positioning strategy to manipulate the attribute embeddings, guiding which to fully preserve the inter-class geometric structure. Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings. Our code is available at: https://github.com/uqzhichen/GSMFlow
Modelling mix-and-match relationships among fashion items has become increasingly demanding yet challenging for modern E-commerce recommender systems. When performing clothes matching, most existing approaches leverage the latent visual features extracted from fashion item images for compatibility modelling, which lacks explainability of generated matching results and can hardly convince users of the recommendations. Though recent methods start to incorporate pre-defined attribute information (e.g., colour, style, length, etc.) for learning item representations and improving the model interpretability, their utilisation of attribute information is still mainly reserved for enhancing the learned item representations and generating explanations via post-processing. As a result, this creates a severe bottleneck when we are trying to advance the recommendation accuracy and generating fine-grained explanations since the explicit attributes have only loose connections to the actual recommendation process. This work aims to tackle the explainability challenge in fashion recommendation tasks by proposing a novel Attribute-aware Fashion Recommender (AFRec). Specifically, AFRec recommender assesses the outfit compatibility by explicitly leveraging the extracted attribute-level representations from each item's visual feature. The attributes serve as the bridge between two fashion items, where we quantify the affinity of a pair of items through the learned compatibility between their attributes. Extensive experiments have demonstrated that, by making full use of the explicit attributes in the recommendation process, AFRec is able to achieve state-of-the-art recommendation accuracy and generate intuitive explanations at the same time.
Being an indispensable component in location-based social networks, next point-of-interest (POI) recommendation recommends users unexplored POIs based on their recent visiting histories. However, existing work mainly models check-in data as isolated POI sequences, neglecting the crucial collaborative signals from cross-sequence check-in information. Furthermore, the sparse POI-POI transitions restrict the ability of a model to learn effective sequential patterns for recommendation. In this paper, we propose Sequence-to-Graph (Seq2Graph) augmentation for each POI sequence, allowing collaborative signals to be propagated from correlated POIs belonging to other sequences. We then devise a novel Sequence-to-Graph POI Recommender (SGRec), which jointly learns POI embeddings and infers a user's temporal preferences from the graph-augmented POI sequence. To overcome the sparsity of POI-level interactions, we further infuse category-awareness into SGRec with a multi-task learning scheme that captures the denser category-wise transitions. As such, SGRec makes full use of the collaborative signals for learning expressive POI representations, and also comprehensively uncovers multi-level sequential patterns for user preference modelling. Extensive experiments on two real-world datasets demonstrate the superiority of SGRec against state-of-the-art methods in next POI recommendation.
In today's context, deploying data-driven services like recommendation on edge devices instead of cloud servers becomes increasingly attractive due to privacy and network latency concerns. A common practice in building compact on-device recommender systems is to compress their embeddings which are normally the cause of excessive parameterization. However, despite the vast variety of devices and their associated memory constraints, existing memory-efficient recommender systems are only specialized for a fixed memory budget in every design and training life cycle, where a new model has to be retrained to obtain the optimal performance while adapting to a smaller/larger memory budget. In this paper, we present a novel lightweight recommendation paradigm that allows a well-trained recommender to be customized for arbitrary device-specific memory constraints without retraining. The core idea is to compose elastic embeddings for each item, where an elastic embedding is the concatenation of a set of embedding blocks that are carefully chosen by an automated search function. Correspondingly, we propose an innovative approach, namely recommendation with universally learned elastic embeddings (RULE). To ensure the expressiveness of all candidate embedding blocks, RULE enforces a diversity-driven regularization when learning different embedding blocks. Then, a performance estimator-based evolutionary search function is designed, allowing for efficient specialization of elastic embeddings under any memory constraint for on-device recommendation. Extensive experiments on real-world datasets reveal the superior performance of RULE under tight memory budgets.
Conversational recommender systems (CRSs) have revolutionized the conventional recommendation paradigm by embracing dialogue agents to dynamically capture the fine-grained user preference. In a typical conversational recommendation scenario, a CRS firstly generates questions to let the user clarify her/his demands and then makes suitable recommendations. Hence, the ability to generate suitable clarifying questions is the key to timely tracing users' dynamic preferences and achieving successful recommendations. However, existing CRSs fall short in asking high-quality questions because: (1) system-generated responses heavily depends on the performance of the dialogue policy agent, which has to be trained with huge conversation corpus to cover all circumstances; and (2) current CRSs cannot fully utilize the learned latent user profiles for generating appropriate and personalized responses. To mitigate these issues, we propose the Knowledge-Based Question Generation System (KBQG), a novel framework for conversational recommendation. Distinct from previous conversational recommender systems, KBQG models a user's preference in a finer granularity by identifying the most relevant relations from a structured knowledge graph (KG). Conditioned on the varied importance of different relations, the generated clarifying questions could perform better in impelling users to provide more details on their preferences. Finially, accurate recommendations can be generated in fewer conversational turns. Furthermore, the proposed KBQG outperforms all baselines in our experiments on two real-world datasets.
With the ubiquitous graph-structured data in various applications, models that can learn compact but expressive vector representations of nodes have become highly desirable. Recently, bearing the message passing paradigm, graph neural networks (GNNs) have greatly advanced the performance of node representation learning on graphs. However, a majority class of GNNs are only designed for homogeneous graphs, leading to inferior adaptivity to the more informative heterogeneous graphs with various types of nodes and edges. Also, despite the necessity of inductively producing representations for completely new nodes (e.g., in streaming scenarios), few heterogeneous GNNs can bypass the transductive learning scheme where all nodes must be known during training. Furthermore, the training efficiency of most heterogeneous GNNs has been hindered by their sophisticated designs for extracting the semantics associated with each meta path or relation. In this paper, we propose WIde and DEep message passing Network (WIDEN) to cope with the aforementioned problems about heterogeneity, inductiveness, and efficiency that are rarely investigated together in graph representation learning. In WIDEN, we propose a novel inductive, meta path-free message passing scheme that packs up heterogeneous node features with their associated edges from both low- and high-order neighbor nodes. To further improve the training efficiency, we innovatively present an active downsampling strategy that drops unimportant neighbor nodes to facilitate faster information propagation. Experiments on three real-world heterogeneous graphs have further validated the efficacy of WIDEN on both transductive and inductive node representation learning, as well as the superior training efficiency against state-of-the-art baselines.
As a well-established approach, factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering. With the prominent development of deep neural networks (DNNs), there is a recent and ongoing trend of enhancing the expressiveness of FM-based models with DNNs. However, though better results are obtained with DNN-based FM variants, such performance gain is paid off by an enormous amount (usually millions) of excessive model parameters on top of the plain FM. Consequently, the heavy parameterization impedes the real-life practicality of those deep models, especially efficient deployment on resource-constrained IoT and edge devices. In this paper, we move beyond the traditional real space where most deep FM-based models are defined, and seek solutions from quaternion representations within the hypercomplex space. Specifically, we propose the quaternion factorization machine (QFM) and quaternion neural factorization machine (QNFM), which are two novel lightweight and memory-efficient quaternion-valued models for sparse predictive analytics. By introducing a brand new take on FM-based models with the notion of quaternion algebra, our models not only enable expressive inter-component feature interactions, but also significantly reduce the parameter size due to lower degrees of freedom in the hypercomplex Hamilton product compared with real-valued matrix multiplication. Extensive experimental results on three large-scale datasets demonstrate that QFM achieves 4.36% performance improvement over the plain FM without introducing any extra parameters, while QNFM outperforms all baselines with up to two magnitudes' parameter size reduction in comparison to state-of-the-art peer methods.