Alert button
Picture for Zhiyong Cheng

Zhiyong Cheng

Alert button

Prior-Free Continual Learning with Unlabeled Data in the Wild

Oct 16, 2023
Tao Zhuo, Zhiyong Cheng, Hehe Fan, Mohan Kankanhalli

Continual Learning (CL) aims to incrementally update a trained model on new tasks without forgetting the acquired knowledge of old ones. Existing CL methods usually reduce forgetting with task priors, \ie using task identity or a subset of previously seen samples for model training. However, these methods would be infeasible when such priors are unknown in real-world applications. To address this fundamental but seldom-studied problem, we propose a Prior-Free Continual Learning (PFCL) method, which learns new tasks without knowing the task identity or any previous data. First, based on a fixed single-head architecture, we eliminate the need for task identity to select the task-specific output head. Second, we employ a regularization-based strategy for consistent predictions between the new and old models, avoiding revisiting previous samples. However, using this strategy alone often performs poorly in class-incremental scenarios, particularly for a long sequence of tasks. By analyzing the effectiveness and limitations of conventional regularization-based methods, we propose enhancing model consistency with an auxiliary unlabeled dataset additionally. Moreover, since some auxiliary data may degrade the performance, we further develop a reliable sample selection strategy to obtain consistent performance improvement. Extensive experiments on multiple image classification benchmark datasets show that our PFCL method significantly mitigates forgetting in all three learning scenarios. Furthermore, when compared to the most recent rehearsal-based methods that replay a limited number of previous samples, PFCL achieves competitive accuracy. Our code is available at: https://github.com/visiontao/pfcl

Viaarxiv icon

Semantic-Guided Feature Distillation for Multimodal Recommendation

Aug 06, 2023
Fan Liu, Huilin Chen, Zhiyong Cheng, Liqiang Nie, Mohan Kankanhalli

Figure 1 for Semantic-Guided Feature Distillation for Multimodal Recommendation
Figure 2 for Semantic-Guided Feature Distillation for Multimodal Recommendation
Figure 3 for Semantic-Guided Feature Distillation for Multimodal Recommendation
Figure 4 for Semantic-Guided Feature Distillation for Multimodal Recommendation

Multimodal recommendation exploits the rich multimodal information associated with users or items to enhance the representation learning for better performance. In these methods, end-to-end feature extractors (e.g., shallow/deep neural networks) are often adopted to tailor the generic multimodal features that are extracted from raw data by pre-trained models for recommendation. However, compact extractors, such as shallow neural networks, may find it challenging to extract effective information from complex and high-dimensional generic modality features. Conversely, DNN-based extractors may encounter the data sparsity problem in recommendation. To address this problem, we propose a novel model-agnostic approach called Semantic-guided Feature Distillation (SGFD), which employs a teacher-student framework to extract feature for multimodal recommendation. The teacher model first extracts rich modality features from the generic modality feature by considering both the semantic information of items and the complementary information of multiple modalities. SGFD then utilizes response-based and feature-based distillation loss to effectively transfer the knowledge encoded in the teacher model to the student model. To evaluate the effectiveness of our SGFD, we integrate SGFD into three backbone multimodal recommendation models. Extensive experiments on three public real-world datasets demonstrate that SGFD-enhanced models can achieve substantial improvement over their counterparts.

* In Proceedings of the 31st ACM International Conference on Multimedia (MM '23), 2023  
* ACM Multimedia 2023 Accepted 
Viaarxiv icon

Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

Jul 27, 2023
Harry Cheng, Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Mohan Kankanhalli

Figure 1 for Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Figure 2 for Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Figure 3 for Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Figure 4 for Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

Training an effective video action recognition model poses significant computational challenges, particularly under limited resource budgets. Current methods primarily aim to either reduce model size or utilize pre-trained models, limiting their adaptability to various backbone architectures. This paper investigates the issue of over-sampled frames, a prevalent problem in many approaches yet it has received relatively little attention. Despite the use of fewer frames being a potential solution, this approach often results in a substantial decline in performance. To address this issue, we propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames. This feature restoration technique brings a negligible increase in computational requirements compared to resource-intensive image encoders, such as ViT. To evaluate the effectiveness of our method, we conduct extensive experiments on four public datasets, including Kinetics-400, ActivityNet, UCF-101, and HMDB-51. With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy. In addition, our method also surprisingly helps improve the generalization ability of the models under zero-shot settings.

* 13 pages. Code and pretrained weight will be released at https://github.com/xaCheng1996/SLLM 
Viaarxiv icon

Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community

Jul 27, 2023
Qingyao Ai, Ting Bai, Zhao Cao, Yi Chang, Jiawei Chen, Zhumin Chen, Zhiyong Cheng, Shoubin Dong, Zhicheng Dou, Fuli Feng, Shen Gao, Jiafeng Guo, Xiangnan He, Yanyan Lan, Chenliang Li, Yiqun Liu, Ziyu Lyu, Weizhi Ma, Jun Ma, Zhaochun Ren, Pengjie Ren, Zhiqiang Wang, Mingwen Wang, Ji-Rong Wen, Le Wu, Xin Xin, Jun Xu, Dawei Yin, Peng Zhang, Fan Zhang, Weinan Zhang, Min Zhang, Xiaofei Zhu

Figure 1 for Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community

The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop's outcomes, including the rethinking of IR's core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.

* 17 pages 
Viaarxiv icon

MB-HGCN: A Hierarchical Graph Convolutional Network for Multi-behavior Recommendation

Jun 19, 2023
Mingshi Yan, Zhiyong Cheng, Jing Sun, Fuming Sun, Yuxin Peng

Figure 1 for MB-HGCN: A Hierarchical Graph Convolutional Network for Multi-behavior Recommendation
Figure 2 for MB-HGCN: A Hierarchical Graph Convolutional Network for Multi-behavior Recommendation
Figure 3 for MB-HGCN: A Hierarchical Graph Convolutional Network for Multi-behavior Recommendation
Figure 4 for MB-HGCN: A Hierarchical Graph Convolutional Network for Multi-behavior Recommendation

Collaborative filtering-based recommender systems that rely on a single type of behavior often encounter serious sparsity issues in real-world applications, leading to unsatisfactory performance. Multi-behavior Recommendation (MBR) is a method that seeks to learn user preferences, represented as vector embeddings, from auxiliary information. By leveraging these preferences for target behavior recommendations, MBR addresses the sparsity problem and improves the accuracy of recommendations. In this paper, we propose MB-HGCN, a novel multi-behavior recommendation model that uses a hierarchical graph convolutional network to learn user and item embeddings from coarse-grained on the global level to fine-grained on the behavior-specific level. Our model learns global embeddings from a unified homogeneous graph constructed by the interactions of all behaviors, which are then used as initialized embeddings for behavior-specific embedding learning in each behavior graph. We also emphasize the distinct of the user and item behaviorspecific embeddings and design two simple-yet-effective strategies to aggregate the behavior-specific embeddings for users and items, respectively. Finally, we adopt multi-task learning for optimization. Extensive experimental results on three real-world datasets demonstrate that our model significantly outperforms the baselines, achieving a relative improvement of 73.93% and 74.21% for HR@10 and NDCG@10, respectively, on the Tmall datasets.

Viaarxiv icon

Continual Learning with Strong Experience Replay

May 23, 2023
Tao Zhuo, Zhiyong Cheng, Zan Gao, Mohan Kankanhalli

Figure 1 for Continual Learning with Strong Experience Replay
Figure 2 for Continual Learning with Strong Experience Replay
Figure 3 for Continual Learning with Strong Experience Replay
Figure 4 for Continual Learning with Strong Experience Replay

Continual Learning (CL) aims at incrementally learning new tasks without forgetting the knowledge acquired from old ones. Experience Replay (ER) is a simple and effective rehearsal-based strategy, which optimizes the model with current training data and a subset of old samples stored in a memory buffer. To further reduce forgetting, recent approaches extend ER with various techniques, such as model regularization and memory sampling. However, the prediction consistency between the new model and the old one on current training data has been seldom explored, resulting in less knowledge preserved when few previous samples are available. To address this issue, we propose a CL method with Strong Experience Replay (SER), which additionally utilizes future experiences mimicked on the current training data, besides distilling past experience from the memory buffer. In our method, the updated model will produce approximate outputs as its original ones, which can effectively preserve the acquired knowledge. Experimental results on multiple image classification datasets show that our SER method surpasses the state-of-the-art methods by a noticeable margin.

Viaarxiv icon

Multi-Behavior Recommendation with Cascading Graph Convolution Networks

Mar 28, 2023
Zhiyong Cheng, Sai Han, Fan Liu, Lei Zhu, Zan Gao, Yuxin Peng

Figure 1 for Multi-Behavior Recommendation with Cascading Graph Convolution Networks
Figure 2 for Multi-Behavior Recommendation with Cascading Graph Convolution Networks
Figure 3 for Multi-Behavior Recommendation with Cascading Graph Convolution Networks
Figure 4 for Multi-Behavior Recommendation with Cascading Graph Convolution Networks

Multi-behavior recommendation, which exploits auxiliary behaviors (e.g., click and cart) to help predict users' potential interactions on the target behavior (e.g., buy), is regarded as an effective way to alleviate the data sparsity or cold-start issues in recommendation. Multi-behaviors are often taken in certain orders in real-world applications (e.g., click>cart>buy). In a behavior chain, a latter behavior usually exhibits a stronger signal of user preference than the former one does. Most existing multi-behavior models fail to capture such dependencies in a behavior chain for embedding learning. In this work, we propose a novel multi-behavior recommendation model with cascading graph convolution networks (named MB-CGCN). In MB-CGCN, the embeddings learned from one behavior are used as the input features for the next behavior's embedding learning after a feature transformation operation. In this way, our model explicitly utilizes the behavior dependencies in embedding learning. Experiments on two benchmark datasets demonstrate the effectiveness of our model on exploiting multi-behavior data. It outperforms the best baseline by 33.7% and 35.9% on average over the two datasets in terms of Recall@10 and NDCG@10, respectively.

* Accepted by WWW 2023 
Viaarxiv icon

Privacy-Preserving Synthetic Data Generation for Recommendation Systems

Sep 27, 2022
Fan Liu, Zhiyong Cheng, Huilin Chen, Yinwei Wei, Liqiang Nie, Mohan Kankanhalli

Figure 1 for Privacy-Preserving Synthetic Data Generation for Recommendation Systems
Figure 2 for Privacy-Preserving Synthetic Data Generation for Recommendation Systems
Figure 3 for Privacy-Preserving Synthetic Data Generation for Recommendation Systems
Figure 4 for Privacy-Preserving Synthetic Data Generation for Recommendation Systems

Recommendation systems make predictions chiefly based on users' historical interaction data (e.g., items previously clicked or purchased). There is a risk of privacy leakage when collecting the users' behavior data for building the recommendation model. However, existing privacy-preserving solutions are designed for tackling the privacy issue only during the model training and results collection phases. The problem of privacy leakage still exists when directly sharing the private user interaction data with organizations or releasing them to the public. To address this problem, in this paper, we present a User Privacy Controllable Synthetic Data Generation model (short for UPC-SDG), which generates synthetic interaction data for users based on their privacy preferences. The generation model aims to provide certain privacy guarantees while maximizing the utility of the generated synthetic data at both data level and item level. Specifically, at the data level, we design a selection module that selects those items that contribute less to a user's preferences from the user's interaction data. At the item level, a synthetic data generation module is proposed to generate a synthetic item corresponding to the selected item based on the user's preferences. Furthermore, we also present a privacy-utility trade-off strategy to balance the privacy and utility of the synthetic data. Extensive experiments and ablation studies have been conducted on three publicly accessible datasets to justify our method, demonstrating its effectiveness in generating synthetic data under users' privacy preferences.

* ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR' 22) 
Viaarxiv icon

Temporal Action Localization with Multi-temporal Scales

Aug 16, 2022
Zan Gao, Xinglei Cui, Tao Zhuo, Zhiyong Cheng, An-An Liu, Meng Wang, Shenyong Chen

Figure 1 for Temporal Action Localization with Multi-temporal Scales
Figure 2 for Temporal Action Localization with Multi-temporal Scales
Figure 3 for Temporal Action Localization with Multi-temporal Scales
Figure 4 for Temporal Action Localization with Multi-temporal Scales

Temporal action localization plays an important role in video analysis, which aims to localize and classify actions in untrimmed videos. The previous methods often predict actions on a feature space of a single-temporal scale. However, the temporal features of a low-level scale lack enough semantics for action classification while a high-level scale cannot provide rich details of the action boundaries. To address this issue, we propose to predict actions on a feature space of multi-temporal scales. Specifically, we use refined feature pyramids of different scales to pass semantics from high-level scales to low-level scales. Besides, to establish the long temporal scale of the entire video, we use a spatial-temporal transformer encoder to capture the long-range dependencies of video frames. Then the refined features with long-range dependencies are fed into a classifier for the coarse action prediction. Finally, to further improve the prediction accuracy, we propose to use a frame-level self attention module to refine the classification and boundaries of each action instance. Extensive experiments show that the proposed method can outperform state-of-the-art approaches on the THUMOS14 dataset and achieves comparable performance on the ActivityNet1.3 dataset. Compared with A2Net (TIP20, Avg\{0.3:0.7\}), Sub-Action (CSVT2022, Avg\{0.1:0.5\}), and AFSD (CVPR21, Avg\{0.3:0.7\}) on the THUMOS14 dataset, the proposed method can achieve improvements of 12.6\%, 17.4\% and 2.2\%, respectively

Viaarxiv icon