Alert button
Picture for Han-Jia Ye

Han-Jia Ye

Alert button

PILOT: A Pre-Trained Model-Based Continual Learning Toolbox

Sep 13, 2023
Hai-Long Sun, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan

While traditional machine learning can effectively tackle a wide range of problems, it primarily operates within a closed-world setting, which presents limitations when dealing with streaming data. As a solution, incremental learning emerges to address real-world scenarios involving new data's arrival. Recently, pre-training has made significant advancements and garnered the attention of numerous researchers. The strong performance of these pre-trained models (PTMs) presents a promising avenue for developing continual learning algorithms that can effectively adapt to real-world scenarios. Consequently, exploring the utilization of PTMs in incremental learning has become essential. This paper introduces a pre-trained model-based continual learning toolbox known as PILOT. On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt. On the other hand, PILOT also fits typical class-incremental learning algorithms (e.g., DER, FOSTER, and MEMO) within the context of pre-trained models to evaluate their effectiveness.

* Code is available at https://github.com/sun-hailong/LAMDA-PILOT 
Viaarxiv icon

ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

Aug 17, 2023
Yi-Kai Zhang, Lu Ren, Chao Yi, Qi-Wei Wang, De-Chuan Zhan, Han-Jia Ye

The rapid expansion of foundation pre-trained models and their fine-tuned counterparts has significantly contributed to the advancement of machine learning. Leveraging pre-trained models to extract knowledge and expedite learning in real-world tasks, known as "Model Reuse", has become crucial in various applications. Previous research focuses on reusing models within a certain aspect, including reusing model weights, structures, and hypothesis spaces. This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend. ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference. This empowers deep learning practitioners to explore downstream tasks and identify the complementary advantages among different methods. ZhiJian is readily accessible at https://github.com/zhangyikaii/lamda-zhijian facilitating seamless utilization of pre-trained models and streamlining the model reuse process for researchers and developers.

Viaarxiv icon

Streaming CTR Prediction: Rethinking Recommendation Task for Real-World Streaming Data

Jul 14, 2023
Qi-Wei Wang, Hongyu Lu, Yu Chen, Da-Wei Zhou, De-Chuan Zhan, Ming Chen, Han-Jia Ye

Figure 1 for Streaming CTR Prediction: Rethinking Recommendation Task for Real-World Streaming Data
Figure 2 for Streaming CTR Prediction: Rethinking Recommendation Task for Real-World Streaming Data
Figure 3 for Streaming CTR Prediction: Rethinking Recommendation Task for Real-World Streaming Data
Figure 4 for Streaming CTR Prediction: Rethinking Recommendation Task for Real-World Streaming Data

The Click-Through Rate (CTR) prediction task is critical in industrial recommender systems, where models are usually deployed on dynamic streaming data in practical applications. Such streaming data in real-world recommender systems face many challenges, such as distribution shift, temporal non-stationarity, and systematic biases, which bring difficulties to the training and utilizing of recommendation models. However, most existing studies approach the CTR prediction as a classification task on static datasets, assuming that the train and test sets are independent and identically distributed (a.k.a, i.i.d. assumption). To bridge this gap, we formulate the CTR prediction problem in streaming scenarios as a Streaming CTR Prediction task. Accordingly, we propose dedicated benchmark settings and metrics to evaluate and analyze the performance of the models in streaming data. To better understand the differences compared to traditional CTR prediction tasks, we delve into the factors that may affect the model performance, such as parameter scale, normalization, regularization, etc. The results reveal the existence of the ''streaming learning dilemma'', whereby the same factor may have different effects on model performance in the static and streaming scenarios. Based on the findings, we propose two simple but inspiring methods (i.e., tuning key parameters and exemplar replay) that significantly improve the effectiveness of the CTR models in the new streaming scenario. We hope our work will inspire further research on streaming CTR prediction and help improve the robustness and adaptability of recommender systems.

Viaarxiv icon

Model Spider: Learning to Rank Pre-Trained Models Efficiently

Jun 06, 2023
Yi-Kai Zhang, Ting-Ji Huang, Yao-Xiang Ding, De-Chuan Zhan, Han-Jia Ye

Figure 1 for Model Spider: Learning to Rank Pre-Trained Models Efficiently
Figure 2 for Model Spider: Learning to Rank Pre-Trained Models Efficiently
Figure 3 for Model Spider: Learning to Rank Pre-Trained Models Efficiently
Figure 4 for Model Spider: Learning to Rank Pre-Trained Models Efficiently

Figuring out which Pre-Trained Model (PTM) from a model zoo fits the target task is essential to take advantage of plentiful model resources. With the availability of numerous heterogeneous PTMs from diverse fields, efficiently selecting the most suitable PTM is challenging due to the time-consuming costs of carrying out forward or backward passes over all PTMs. In this paper, we propose Model Spider, which tokenizes both PTMs and tasks by summarizing their characteristics into vectors to enable efficient PTM selection. By leveraging the approximated performance of PTMs on a separate set of training tasks, Model Spider learns to construct tokens and measure the fitness score between a model-task pair via their tokens. The ability to rank relevant PTMs higher than others generalizes to new tasks. With the top-ranked PTM candidates, we further learn to enrich task tokens with their PTM-specific semantics to re-rank the PTMs for better selection. Model Spider balances efficiency and selection ability, making PTM selection like a spider preying on a web. Model Spider demonstrates promising performance in various configurations of model zoos.

Viaarxiv icon

Learning without Forgetting for Vision-Language Models

May 30, 2023
Da-Wei Zhou, Yuanhan Zhang, Jingyi Ning, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

Figure 1 for Learning without Forgetting for Vision-Language Models
Figure 2 for Learning without Forgetting for Vision-Language Models
Figure 3 for Learning without Forgetting for Vision-Language Models
Figure 4 for Learning without Forgetting for Vision-Language Models

Class-Incremental Learning (CIL) or continual learning is a desired capability in the real world, which requires a learning system to adapt to new tasks without forgetting former ones. While traditional CIL methods focus on visual information to grasp core features, recent advances in Vision-Language Models (VLM) have shown promising capabilities in learning generalizable representations with the aid of textual information. However, when continually trained with new classes, VLMs often suffer from catastrophic forgetting of former knowledge. Applying VLMs to CIL poses two major challenges: 1) how to adapt the model without forgetting; and 2) how to make full use of the multi-modal information. To this end, we propose PROjectiOn Fusion (PROOF) that enables VLMs to learn without forgetting. To handle the first challenge, we propose training task-specific projections based on the frozen image/text encoders. When facing new tasks, new projections are expanded and former projections are fixed, alleviating the forgetting of old concepts. For the second challenge, we propose the fusion module to better utilize the cross-modality information. By jointly adjusting visual and textual features, the model can capture semantic information with stronger representation ability. Extensive experiments on nine benchmark datasets validate PROOF achieves state-of-the-art performance.

Viaarxiv icon

Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising

May 29, 2023
Fu-Yun Wang, Wenshuo Chen, Guanglu Song, Han-Jia Ye, Yu Liu, Hongsheng Li

Figure 1 for Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
Figure 2 for Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
Figure 3 for Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
Figure 4 for Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising

Leveraging large-scale image-text datasets and advancements in diffusion models, text-driven generative models have made remarkable strides in the field of image generation and editing. This study explores the potential of extending the text-driven ability to the generation and editing of multi-text conditioned long videos. Current methodologies for video generation and editing, while innovative, are often confined to extremely short videos (typically less than 24 frames) and are limited to a single text condition. These constraints significantly limit their applications given that real-world videos usually consist of multiple segments, each bearing different semantic information. To address this challenge, we introduce a novel paradigm dubbed as Gen-L-Video, capable of extending off-the-shelf short video diffusion models for generating and editing videos comprising hundreds of frames with diverse semantic segments without introducing additional training, all while preserving content consistency. We have implemented three mainstream text-driven video generation and editing methodologies and extended them to accommodate longer videos imbued with a variety of semantic segments with our proposed paradigm. Our experimental outcomes reveal that our approach significantly broadens the generative and editing capabilities of video diffusion models, offering new possibilities for future research and applications. The code is available at https://github.com/G-U-N/Gen-L-Video.

* The code is available at https://github.com/G-U-N/Gen-L-Video 
Viaarxiv icon

Preserving Locality in Vision Transformers for Class Incremental Learning

Apr 14, 2023
Bowen Zheng, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan

Figure 1 for Preserving Locality in Vision Transformers for Class Incremental Learning
Figure 2 for Preserving Locality in Vision Transformers for Class Incremental Learning
Figure 3 for Preserving Locality in Vision Transformers for Class Incremental Learning
Figure 4 for Preserving Locality in Vision Transformers for Class Incremental Learning

Learning new classes without forgetting is crucial for real-world applications for a classification model. Vision Transformers (ViT) recently achieve remarkable performance in Class Incremental Learning (CIL). Previous works mainly focus on block design and model expansion for ViTs. However, in this paper, we find that when the ViT is incrementally trained, the attention layers gradually lose concentration on local features. We call this interesting phenomenon as \emph{Locality Degradation} in ViTs for CIL. Since the low-level local information is crucial to the transferability of the representation, it is beneficial to preserve the locality in attention layers. In this paper, we encourage the model to preserve more local information as the training procedure goes on and devise a Locality-Preserved Attention (LPA) layer to emphasize the importance of local features. Specifically, we incorporate the local information directly into the vanilla attention and control the initial gradients of the vanilla attention by weighting it with a small initial value. Extensive experiments show that the representations facilitated by LPA capture more low-level general information which is easier to transfer to follow-up tasks. The improved model gets consistently better performance on CIFAR100 and ImageNet100.

Viaarxiv icon

The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting

Apr 11, 2023
Lu Han, Han-Jia Ye, De-Chuan Zhan

Figure 1 for The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting
Figure 2 for The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting
Figure 3 for The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting
Figure 4 for The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting

Multivariate time series data comprises various channels of variables. The multivariate forecasting models need to capture the relationship between the channels to accurately predict future values. However, recently, there has been an emergence of methods that employ the Channel Independent (CI) strategy. These methods view multivariate time series data as separate univariate time series and disregard the correlation between channels. Surprisingly, our empirical results have shown that models trained with the CI strategy outperform those trained with the Channel Dependent (CD) strategy, usually by a significant margin. Nevertheless, the reasons behind this phenomenon have not yet been thoroughly explored in the literature. This paper provides comprehensive empirical and theoretical analyses of the characteristics of multivariate time series datasets and the CI/CD strategy. Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series. In contrast, the CI approach trades capacity for robust prediction. Practical measures inspired by these analyses are proposed to address the capacity and robustness dilemma, including a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy. We hope our findings can raise awareness among researchers about the characteristics of multivariate time series and inspire the construction of better forecasting models.

* under review 
Viaarxiv icon

Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need

Mar 13, 2023
Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

Figure 1 for Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need
Figure 2 for Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need
Figure 3 for Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need
Figure 4 for Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need

Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones. Traditional CIL models are trained from scratch to continually acquire knowledge as data evolves. Recently, pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL. Contrary to traditional methods, PTMs possess generalizable embeddings, which can be easily transferred. In this work, we revisit CIL with PTMs and argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring. 1) We first reveal that frozen PTM can already provide generalizable embeddings for CIL. Surprisingly, a simple baseline (SimpleCIL) which continually sets the classifiers of PTM to prototype features can beat state-of-the-art even without training on the downstream task. 2) Due to the distribution gap between pre-trained and downstream datasets, PTM can be further cultivated with adaptivity via model adapting. We propose ADapt And Merge (ADAM), which aggregates the embeddings of PTM and adapted models for classifier construction. ADAM is a general framework that can be orthogonally combined with any parameter-efficient tuning method, which holds the advantages of PTM's generalizability and adapted model's adaptivity. 3) Additionally, we find previous benchmarks are unsuitable in the era of PTM due to data overlapping and propose four new benchmarks for assessment, namely ImageNet-A, ObjectNet, OmniBenchmark, and VTAB. Extensive experiments validate the effectiveness of ADAM with a unified and concise framework.

* Code is available at: https://github.com/zhoudw-zdw/RevisitingCIL 
Viaarxiv icon

Deep Class-Incremental Learning: A Survey

Feb 07, 2023
Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu

Figure 1 for Deep Class-Incremental Learning: A Survey
Figure 2 for Deep Class-Incremental Learning: A Survey
Figure 3 for Deep Class-Incremental Learning: A Survey
Figure 4 for Deep Class-Incremental Learning: A Survey

Deep models, e.g., CNNs and Vision Transformers, have achieved impressive achievements in many vision tasks in the closed world. However, novel classes emerge from time to time in our ever-changing world, requiring a learning system to acquire new knowledge continually. For example, a robot needs to understand new instructions, and an opinion monitoring system should analyze emerging topics every day. Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally and build a universal classifier among all seen classes. Correspondingly, when directly training the model with new class instances, a fatal problem occurs -- the model tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades. There have been numerous efforts to tackle catastrophic forgetting in the machine learning community. In this paper, we survey comprehensively recent advances in deep class-incremental learning and summarize these methods from three aspects, i.e., data-centric, model-centric, and algorithm-centric. We also provide a rigorous and unified evaluation of 16 methods in benchmark image classification tasks to find out the characteristics of different algorithms empirically. Furthermore, we notice that the current comparison protocol ignores the influence of memory budget in model storage, which may result in unfair comparison and biased results. Hence, we advocate fair comparison by aligning the memory budget in evaluation, as well as several memory-agnostic performance measures. The source code to reproduce these evaluations is available at https://github.com/zhoudw-zdw/CIL_Survey/

* Code is available at https://github.com/zhoudw-zdw/CIL_Survey/ 
Viaarxiv icon