Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daoguo Dong

Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning

Nov 14, 2025

Haoran Chen, Houze Xu, Micah Goldblum, Daoguo Dong, Zuxuan Wu

Figure 1 for Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning

Figure 2 for Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning

Figure 3 for Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning

Figure 4 for Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning

Abstract:Class-incremental learning (CIL) enables models to continuously learn new categories from sequential tasks without forgetting previously acquired knowledge. While recent advances in vision-language models such as CLIP have demonstrated strong generalization across domains, extending them to continual settings remains challenging. In particular, learning task-specific soft prompts for newly introduced classes often leads to severe classifier bias, as the text prototypes overfit to recent categories when prior data are unavailable. In this paper, we propose DMC, a simple yet effective two-stage framework for CLIP-based CIL that decouples the adaptation of the vision encoder and the optimization of textual soft prompts. Each stage is trained with the other frozen, allowing one modality to act as a stable semantic anchor for the other to preserve cross-modal alignment. Furthermore, current CLIP-based CIL approaches typically store class-wise Gaussian statistics for generative replay, yet they overlook the distributional drift that arises when the vision encoder is updated over time. To address this issue, we introduce DMC-OT, an enhanced version of DMC that incorporates an optimal-transport guided calibration strategy to align memory statistics across evolving encoders, along with a task-specific prompting design that enhances inter-task separability. Extensive experiments on CIFAR-100, Imagenet-R, CUB-200, and UCF-101 demonstrate that both DMC and DMC-OT achieve state-of-the-art performance, with DMC-OT further improving accuracy by an average of 1.80%.

Via

Access Paper or Ask Questions

Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Oct 10, 2024

Kaiyuan Liu, Jiahao Mei, Hengyu Zhang, Yihuai Zhang, Xingjiao Wu, Daoguo Dong, Liang He

Figure 1 for Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Figure 2 for Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Figure 3 for Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Figure 4 for Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation

Abstract:Although Chinese calligraphy generation has achieved style transfer, generating calligraphy by specifying the calligrapher, font, and character style remains challenging. To address this, we propose a new Chinese calligraphy generation model 'Moyun' , which replaces the Unet in the Diffusion model with Vision Mamba and introduces the TripleLabel control mechanism to achieve controllable calligraphy generation. The model was tested on our large-scale dataset 'Mobao' of over 1.9 million images, and the results demonstrate that 'Moyun' can effectively control the generation process and produce calligraphy in the specified style. Even for calligraphy the calligrapher has not written, 'Moyun' can generate calligraphy that matches the style of the calligrapher.

Via

Access Paper or Ask Questions

TEAdapter: Supply abundant guidance for controllable text-to-music generation

Aug 09, 2024

Jialing Zou, Jiahao Mei, Xudong Nan, Jinghua Li, Daoguo Dong, Liang He

Figure 1 for TEAdapter: Supply abundant guidance for controllable text-to-music generation

Figure 2 for TEAdapter: Supply abundant guidance for controllable text-to-music generation

Figure 3 for TEAdapter: Supply abundant guidance for controllable text-to-music generation

Figure 4 for TEAdapter: Supply abundant guidance for controllable text-to-music generation

Abstract:Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In addition, we explore the controllable generation of extended music by leveraging TEAdapter control groups trained on data of distinct structural functionalities. In general, we consider controls over global, elemental, and structural levels. Experimental results demonstrate that the proposed TEAdapter enables multiple precise controls and ensures high-quality music generation. Our module is also lightweight and transferable to any diffusion model architecture. Available code and demos will be found soon at https://github.com/Ashley1101/TEAdapter.

* 2024 IEEE International Conference on Multimedia and Expo (ICME 2024)
* Accepted by ICME'24: IEEE International Conference on Multimedia and Expo

Via

Access Paper or Ask Questions