Alert button
Picture for Guibing Guo

Guibing Guo

Alert button

ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation

Nov 10, 2023
Yuting Liu, Enneng Yang, Yizhou Dang, Guibing Guo, Qiang Liu, Yuliang Liang, Linying Jiang, Xingwei Wang

Multimodal recommendation aims to model user and item representations comprehensively with the involvement of multimedia content for effective recommendations. Existing research has shown that it is beneficial for recommendation performance to combine (user- and item-) ID embeddings with multimodal salient features, indicating the value of IDs. However, there is a lack of a thorough analysis of the ID embeddings in terms of feature semantics in the literature. In this paper, we revisit the value of ID embeddings for multimodal recommendation and conduct a thorough study regarding its semantics, which we recognize as subtle features of content and structures. Then, we propose a novel recommendation model by incorporating ID embeddings to enhance the semantic features of both content and structures. Specifically, we put forward a hierarchical attention mechanism to incorporate ID embeddings in modality fusing, coupled with contrastive learning, to enhance content representations. Meanwhile, we propose a lightweight graph convolutional network for each modality to amalgamate neighborhood and ID embeddings for improving structural representations. Finally, the content and structure representations are combined to form the ultimate item embedding for recommendation. Extensive experiments on three real-world datasets (Baby, Sports, and Clothing) demonstrate the superiority of our method over state-of-the-art multimodal recommendation methods and the effectiveness of fine-grained ID embeddings.

Viaarxiv icon

AdaMerging: Adaptive Model Merging for Multi-Task Learning

Oct 04, 2023
Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, Dacheng Tao

Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to a significant deterioration in the overall performance of the merged model. This decline occurs due to potential conflicts and intricate correlations among the multiple tasks. Consequently, the challenge emerges of how to merge pre-trained models more effectively without using their original training data. This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging). This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Specifically, our AdaMerging method operates as an automatic, unsupervised task arithmetic scheme. It leverages entropy minimization on unlabeled test samples from the multi-task setup as a surrogate objective function to iteratively refine the merging coefficients of the multiple models. Our experimental findings across eight tasks demonstrate the efficacy of the AdaMerging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11\% improvement in performance. Notably, AdaMerging also exhibits superior generalization capabilities when applied to unseen downstream tasks. Furthermore, it displays a significantly enhanced robustness to data distribution shifts that may occur during the testing phase.

Viaarxiv icon

Continual Learning From a Stream of APIs

Aug 31, 2023
Enneng Yang, Zhenyi Wang, Li Shen, Nan Yin, Tongliang Liu, Guibing Guo, Xingwei Wang, Dacheng Tao

Figure 1 for Continual Learning From a Stream of APIs
Figure 2 for Continual Learning From a Stream of APIs
Figure 3 for Continual Learning From a Stream of APIs
Figure 4 for Continual Learning From a Stream of APIs

Continual learning (CL) aims to learn new tasks without forgetting previous tasks. However, existing CL methods require a large amount of raw data, which is often unavailable due to copyright considerations and privacy risks. Instead, stakeholders usually release pre-trained machine learning models as a service (MLaaS), which users can access via APIs. This paper considers two practical-yet-novel CL settings: data-efficient CL (DECL-APIs) and data-free CL (DFCL-APIs), which achieve CL from a stream of APIs with partial or no raw data. Performing CL under these two new settings faces several challenges: unavailable full raw data, unknown model parameters, heterogeneous models of arbitrary architecture and scale, and catastrophic forgetting of previous APIs. To overcome these issues, we propose a novel data-free cooperative continual distillation learning framework that distills knowledge from a stream of APIs into a CL model by generating pseudo data, just by querying APIs. Specifically, our framework includes two cooperative generators and one CL model, forming their training as an adversarial game. We first use the CL model and the current API as fixed discriminators to train generators via a derivative-free method. Generators adversarially generate hard and diverse synthetic data to maximize the response gap between the CL model and the API. Next, we train the CL model by minimizing the gap between the responses of the CL model and the black-box API on synthetic data, to transfer the API's knowledge to the CL model. Furthermore, we propose a new regularization term based on network similarity to prevent catastrophic forgetting of previous APIs.Our method performs comparably to classic CL with full raw data on the MNIST and SVHN in the DFCL-APIs setting. In the DECL-APIs setting, our method achieves 0.97x, 0.75x and 0.69x performance of classic CL on CIFAR10, CIFAR100, and MiniImageNet.

Viaarxiv icon

Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval

Aug 26, 2023
Zichen Yuan, Qi Shen, Bingyi Zheng, Yuting Liu, Linying Jiang, Guibing Guo

Figure 1 for Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval
Figure 2 for Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval
Figure 3 for Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval
Figure 4 for Video and Audio are Images: A Cross-Modal Mixer for Original Data on Video-Audio Retrieval

Cross-modal retrieval has become popular in recent years, particularly with the rise of multimedia. Generally, the information from each modality exhibits distinct representations and semantic information, which makes feature tends to be in separate latent spaces encoded with dual-tower architecture and makes it difficult to establish semantic relationships between modalities, resulting in poor retrieval performance. To address this issue, we propose a novel framework for cross-modal retrieval which consists of a cross-modal mixer, a masked autoencoder for pre-training, and a cross-modal retriever for downstream tasks.In specific, we first adopt cross-modal mixer and mask modeling to fuse the original modality and eliminate redundancy. Then, an encoder-decoder architecture is applied to achieve a fuse-then-separate task in the pre-training phase.We feed masked fused representations into the encoder and reconstruct them with the decoder, ultimately separating the original data of two modalities. In downstream tasks, we use the pre-trained encoder to build the cross-modal retrieval method. Extensive experiments on 2 real-world datasets show that our approach outperforms previous state-of-the-art methods in video-audio matching tasks, improving retrieval accuracy by up to 2 times. Furthermore, we prove our model performance by transferring it to other downstream tasks as a universal model.

Viaarxiv icon

Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation

Dec 16, 2022
Yizhou Dang, Enneng Yang, Guibing Guo, Linying Jiang, Xingwei Wang, Xiaoxiao Xu, Qinghui Sun, Hong Liu

Figure 1 for Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation
Figure 2 for Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation
Figure 3 for Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation
Figure 4 for Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation

Sequential recommendation is an important task to predict the next-item to access based on a sequence of interacted items. Most existing works learn user preference as the transition pattern from the previous item to the next one, ignoring the time interval between these two items. However, we observe that the time interval in a sequence may vary significantly different, and thus result in the ineffectiveness of user modeling due to the issue of \emph{preference drift}. In fact, we conducted an empirical study to validate this observation, and found that a sequence with uniformly distributed time interval (denoted as uniform sequence) is more beneficial for performance improvement than that with greatly varying time interval. Therefore, we propose to augment sequence data from the perspective of time interval, which is not studied in the literature. Specifically, we design five operators (Ti-Crop, Ti-Reorder, Ti-Mask, Ti-Substitute, Ti-Insert) to transform the original non-uniform sequence to uniform sequence with the consideration of variance of time intervals. Then, we devise a control strategy to execute data augmentation on item sequences in different lengths. Finally, we implement these improvements on a state-of-the-art model CoSeRec and validate our approach on four real datasets. The experimental results show that our approach reaches significantly better performance than the other 11 competing methods. Our implementation is available: https://github.com/KingGugu/TiCoSeRec.

* 9 pages, 4 figures, AAAI-2023 
Viaarxiv icon

AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning

Nov 28, 2022
Enneng Yang, Junwei Pan, Ximei Wang, Haibin Yu, Li Shen, Xihua Chen, Lei Xiao, Jie Jiang, Guibing Guo

Figure 1 for AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
Figure 2 for AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
Figure 3 for AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
Figure 4 for AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning

Multi-task learning (MTL) models have demonstrated impressive results in computer vision, natural language processing, and recommender systems. Even though many approaches have been proposed, how well these approaches balance different tasks on each parameter still remains unclear. In this paper, we propose to measure the task dominance degree of a parameter by the total updates of each task on this parameter. Specifically, we compute the total updates by the exponentially decaying Average of the squared Updates (AU) on a parameter from the corresponding task.Based on this novel metric, we observe that many parameters in existing MTL methods, especially those in the higher shared layers, are still dominated by one or several tasks. The dominance of AU is mainly due to the dominance of accumulative gradients from one or several tasks. Motivated by this, we propose a Task-wise Adaptive learning rate approach, AdaTask in short, to separate the \emph{accumulative gradients} and hence the learning rate of each task for each parameter in adaptive learning rate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experiments on computer vision and recommender system MTL datasets demonstrate that AdaTask significantly improves the performance of dominated tasks, resulting SOTA average task-wise performance. Analysis on both synthetic and real-world datasets shows AdaTask balance parameters in every shared layer well.

* AAAI 2023 
Viaarxiv icon

Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction

Jun 06, 2021
Wei Wei, Jiayi Liu, Xianling Mao, Guibing Guo, Feida Zhu, Pan Zhou, Yuchong Hu

Figure 1 for Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction
Figure 2 for Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction
Figure 3 for Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction
Figure 4 for Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction

The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions. However, this challenge is not well addressed in the literature, since most of the approaches neglect the emotional information conveyed by a post while generating responses. This article addresses this problem by proposing a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post for generating more intelligent responses with appropriately expressed emotions. Extensive experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.

* Accepted at CIKM 2019. arXiv admin note: substantial text overlap with arXiv:2011.07432 
Viaarxiv icon

NEUer at SemEval-2021 Task 4: Complete Summary Representation by Filling Answers into Question for Matching Reading Comprehension

May 25, 2021
Zhixiang Chen, Yikun Lei, Pai Liu, Guibing Guo

Figure 1 for NEUer at SemEval-2021 Task 4: Complete Summary Representation by Filling Answers into Question for Matching Reading Comprehension
Figure 2 for NEUer at SemEval-2021 Task 4: Complete Summary Representation by Filling Answers into Question for Matching Reading Comprehension
Figure 3 for NEUer at SemEval-2021 Task 4: Complete Summary Representation by Filling Answers into Question for Matching Reading Comprehension
Figure 4 for NEUer at SemEval-2021 Task 4: Complete Summary Representation by Filling Answers into Question for Matching Reading Comprehension

SemEval task 4 aims to find a proper option from multiple candidates to resolve the task of machine reading comprehension. Most existing approaches propose to concat question and option together to form a context-aware model. However, we argue that straightforward concatenation can only provide a coarse-grained context for the MRC task, ignoring the specific positions of the option relative to the question. In this paper, we propose a novel MRC model by filling options into the question to produce a fine-grained context (defined as summary) which can better reveal the relationship between option and question. We conduct a series of experiments on the given dataset, and the results show that our approach outperforms other counterparts to a large extent.

* accepted by SemEval2021 
Viaarxiv icon

Generalized Embedding Machines for Recommender Systems

Feb 16, 2020
Enneng Yang, Xin Xin, Li Shen, Guibing Guo

Figure 1 for Generalized Embedding Machines for Recommender Systems
Figure 2 for Generalized Embedding Machines for Recommender Systems
Figure 3 for Generalized Embedding Machines for Recommender Systems
Figure 4 for Generalized Embedding Machines for Recommender Systems

Factorization machine (FM) is an effective model for feature-based recommendation which utilizes inner product to capture second-order feature interactions. However, one of the major drawbacks of FM is that it couldn't capture complex high-order interaction signals. A common solution is to change the interaction function, such as stacking deep neural networks on the top of FM. In this work, we propose an alternative approach to model high-order interaction signals in the embedding level, namely Generalized Embedding Machine (GEM). The embedding used in GEM encodes not only the information from the feature itself but also the information from other correlated features. Under such situation, the embedding becomes high-order. Then we can incorporate GEM with FM and even its advanced variants to perform feature interactions. More specifically, in this paper we utilize graph convolution networks (GCN) to generate high-order embeddings. We integrate GEM with several FM-based models and conduct extensive experiments on two real-world datasets. The results demonstrate significant improvement of GEM over corresponding baselines.

* 8 pages 
Viaarxiv icon

Deep Learning-based Sequential Recommender Systems: Concepts, Algorithms, and Evaluations

Apr 30, 2019
Hui Fang, Danning Zhang, Yiheng Shu, Guibing Guo

Figure 1 for Deep Learning-based Sequential Recommender Systems: Concepts, Algorithms, and Evaluations
Figure 2 for Deep Learning-based Sequential Recommender Systems: Concepts, Algorithms, and Evaluations
Figure 3 for Deep Learning-based Sequential Recommender Systems: Concepts, Algorithms, and Evaluations
Figure 4 for Deep Learning-based Sequential Recommender Systems: Concepts, Algorithms, and Evaluations

In the field of sequential recommendation, deep learning methods have received a lot of attention in the past few years and surpassed traditional models such as Markov chain-based and factorization-based ones. However, DL-based methods also have some critical drawbacks, such as insufficient modeling of user representation and ignoring to distinguish the different types of interactions (i.e., user behavior) among users and items. In this view, this survey focuses on DL-based sequential recommender systems by taking the aforementioned issues into consideration. Specifically, we illustrate the concept of sequential recommendation, propose a categorization of existing algorithms in terms of three types of behavioral sequence, summarize the key factors affecting the performance of DL-based models, and conduct corresponding evaluations to demonstrate the effects of these factors. We conclude this survey by systematically outlining future directions and challenges in this field.

* 20 pages, 17 figures, 5 tables, 97 references 
Viaarxiv icon