Alert button

Continual Vision-Language Representation Learning with Off-Diagonal Information

May 15, 2023
Zixuan Ni, Longhui Wei, Siliang Tang, Yueting Zhuang, Qi Tian

Figure 1 for Continual Vision-Language Representation Learning with Off-Diagonal Information
Figure 2 for Continual Vision-Language Representation Learning with Off-Diagonal Information
Figure 3 for Continual Vision-Language Representation Learning with Off-Diagonal Information
Figure 4 for Continual Vision-Language Representation Learning with Off-Diagonal Information

Share this with someone who'll enjoy it:

This paper discusses the feasibility of continuously training the CLIP model through streaming data. Then, by tracking the directional changes of the representation vectors in the continuously updated CLIP model, we explore and summarize these spatial variations as Spatial Disorder (SD), which can be divided into Intra-modal Rotation and Inter-modal Deviation. Moreover, we demonstrate how intra-modal rotation and inter-modal deviation lead to a performance decline for CLIP on cross-modal retrieval tasks in both empirically and theoretically. To alleviate the spatial disorder, we propose a simple yet effective continual learning framework Mod-X: \textbf{M}aintain \textbf{o}ff-\textbf{d}iagonal information-matri\textbf{X}. The experiments (in Section \ref{method}, \ref{experiments} and Appendix \ref{Appendix_to_experiments}) on commonly used datasets with different scales and scopes have illustrated the effectiveness of our method.

* ICML 2023  
View paper onarxiv icon

Share this with someone who'll enjoy it: