Alert button
Picture for Tiancheng Lin

Tiancheng Lin

Alert button

Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment

Sep 08, 2023
Hongyu Hu, Tiancheng Lin, Jie Wang, Zhenbang Sun, Yi Xu

Figure 1 for Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment
Figure 2 for Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment
Figure 3 for Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment
Figure 4 for Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment

Large-scale vision-language models (VLMs), e.g., CLIP, learn broad visual concepts from tedious training data, showing superb generalization ability. Amount of prompt learning methods have been proposed to efficiently adapt the VLMs to downstream tasks with only a few training samples. We introduce a novel method to improve the prompt learning of vision-language models by incorporating pre-trained large language models (LLMs), called Dual-Aligned Prompt Tuning (DuAl-PT). Learnable prompts, like CoOp, implicitly model the context through end-to-end training, which are difficult to control and interpret. While explicit context descriptions generated by LLMs, like GPT-3, can be directly used for zero-shot classification, such prompts are overly relying on LLMs and still underexplored in few-shot domains. With DuAl-PT, we propose to learn more context-aware prompts, benefiting from both explicit and implicit context modeling. To achieve this, we introduce a pre-trained LLM to generate context descriptions, and we encourage the prompts to learn from the LLM's knowledge by alignment, as well as the alignment between prompts and local image features. Empirically, DuAl-PT achieves superior performance on 11 downstream datasets on few-shot recognition and base-to-new generalization. Hopefully, DuAl-PT can serve as a strong baseline. Code will be available.

Viaarxiv icon

Relational Contrastive Learning for Scene Text Recognition

Aug 01, 2023
Jinglei Zhang, Tiancheng Lin, Yi Xu, Kai Chen, Rui Zhang

Figure 1 for Relational Contrastive Learning for Scene Text Recognition
Figure 2 for Relational Contrastive Learning for Scene Text Recognition
Figure 3 for Relational Contrastive Learning for Scene Text Recognition
Figure 4 for Relational Contrastive Learning for Scene Text Recognition

Context-aware methods achieved great success in supervised scene text recognition via incorporating semantic priors from words. We argue that such prior contextual information can be interpreted as the relations of textual primitives due to the heterogeneous text and background, which can provide effective self-supervised labels for representation learning. However, textual relations are restricted to the finite size of dataset due to lexical dependencies, which causes the problem of over-fitting and compromises representation robustness. To this end, we propose to enrich the textual relations via rearrangement, hierarchy and interaction, and design a unified framework called RCLSTR: Relational Contrastive Learning for Scene Text Recognition. Based on causality, we theoretically explain that three modules suppress the bias caused by the contextual prior and thus guarantee representation robustness. Experiments on representation quality show that our method outperforms state-of-the-art self-supervised STR methods. Code is available at https://github.com/ThunderVVV/RCLSTR.

* Accepted by ACMMM 2023 
Viaarxiv icon

SLPD: Slide-level Prototypical Distillation for WSIs

Jul 20, 2023
Zhimiao Yu, Tiancheng Lin, Yi Xu

Figure 1 for SLPD: Slide-level Prototypical Distillation for WSIs
Figure 2 for SLPD: Slide-level Prototypical Distillation for WSIs
Figure 3 for SLPD: Slide-level Prototypical Distillation for WSIs

Improving the feature representation ability is the foundation of many whole slide pathological image (WSIs) tasks. Recent works have achieved great success in pathological-specific self-supervised learning (SSL). However, most of them only focus on learning patch-level representations, thus there is still a gap between pretext and slide-level downstream tasks, e.g., subtyping, grading and staging. Aiming towards slide-level representations, we propose Slide-Level Prototypical Distillation (SLPD) to explore intra- and inter-slide semantic structures for context modeling on WSIs. Specifically, we iteratively perform intra-slide clustering for the regions (4096x4096 patches) within each WSI to yield the prototypes and encourage the region representations to be closer to the assigned prototypes. By representing each slide with its prototypes, we further select similar slides by the set distance of prototypes and assign the regions by cross-slide prototypes for distillation. SLPD achieves state-of-the-art results on multiple slide-level benchmarks and demonstrates that representation learning of semantic structures of slides can make a suitable proxy task for WSI analysis. Code will be available at https://github.com/Carboxy/SLPD.

* International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 
Viaarxiv icon

Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images

Mar 13, 2023
Tiancheng Lin, Zhimiao Yu, Hongyu Hu, Yi Xu, Chang Wen Chen

Figure 1 for Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images
Figure 2 for Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images
Figure 3 for Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images
Figure 4 for Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images

Multi-instance learning (MIL) is an effective paradigm for whole-slide pathological images (WSIs) classification to handle the gigapixel resolution and slide-level label. Prevailing MIL methods primarily focus on improving the feature extractor and aggregator. However, one deficiency of these methods is that the bag contextual prior may trick the model into capturing spurious correlations between bags and labels. This deficiency is a confounder that limits the performance of existing MIL methods. In this paper, we propose a novel scheme, Interventional Bag Multi-Instance Learning (IBMIL), to achieve deconfounded bag-level prediction. Unlike traditional likelihood-based strategies, the proposed scheme is based on the backdoor adjustment to achieve the interventional training, thus is capable of suppressing the bias caused by the bag contextual prior. Note that the principle of IBMIL is orthogonal to existing bag MIL methods. Therefore, IBMIL is able to bring consistent performance boosting to existing schemes, achieving new state-of-the-art performance. Code is available at https://github.com/HHHedo/IBMIL.

* Accepted by CVPR 2023; Code at https://github.com/HHHedo/IBMIL 
Viaarxiv icon

Interventional Multi-Instance Learning with Deconfounded Instance-Level Prediction

Apr 22, 2022
Tiancheng Lin, Hongteng Xu, Canqian Yang, Yi Xu

Figure 1 for Interventional Multi-Instance Learning with Deconfounded Instance-Level Prediction
Figure 2 for Interventional Multi-Instance Learning with Deconfounded Instance-Level Prediction
Figure 3 for Interventional Multi-Instance Learning with Deconfounded Instance-Level Prediction
Figure 4 for Interventional Multi-Instance Learning with Deconfounded Instance-Level Prediction

When applying multi-instance learning (MIL) to make predictions for bags of instances, the prediction accuracy of an instance often depends on not only the instance itself but also its context in the corresponding bag. From the viewpoint of causal inference, such bag contextual prior works as a confounder and may result in model robustness and interpretability issues. Focusing on this problem, we propose a novel interventional multi-instance learning (IMIL) framework to achieve deconfounded instance-level prediction. Unlike traditional likelihood-based strategies, we design an Expectation-Maximization (EM) algorithm based on causal intervention, providing a robust instance selection in the training phase and suppressing the bias caused by the bag contextual prior. Experiments on pathological image analysis demonstrate that our IMIL method substantially reduces false positives and outperforms state-of-the-art MIL methods.

* 7 pages. Accepted by AAAI2022 
Viaarxiv icon

Solving The Long-Tailed Problem via Intra- and Inter-Category Balance

Apr 22, 2022
Renhui Zhang, Tiancheng Lin, Rui Zhang, Yi Xu

Figure 1 for Solving The Long-Tailed Problem via Intra- and Inter-Category Balance
Figure 2 for Solving The Long-Tailed Problem via Intra- and Inter-Category Balance
Figure 3 for Solving The Long-Tailed Problem via Intra- and Inter-Category Balance
Figure 4 for Solving The Long-Tailed Problem via Intra- and Inter-Category Balance

Benchmark datasets for visual recognition assume that data is uniformly distributed, while real-world datasets obey long-tailed distribution. Current approaches handle the long-tailed problem to transform the long-tailed dataset to uniform distribution by re-sampling or re-weighting strategies. These approaches emphasize the tail classes but ignore the hard examples in head classes, which result in performance degradation. In this paper, we propose a novel gradient harmonized mechanism with category-wise adaptive precision to decouple the difficulty and sample size imbalance in the long-tailed problem, which are correspondingly solved via intra- and inter-category balance strategies. Specifically, intra-category balance focuses on the hard examples in each category to optimize the decision boundary, while inter-category balance aims to correct the shift of decision boundary by taking each category as a unit. Extensive experiments demonstrate that the proposed method consistently outperforms other approaches on all the datasets.

* 4 pages. Accepted by ICASSP2022 
Viaarxiv icon

Enhancing Non-mass Breast Ultrasound Cancer Classification With Knowledge Transfer

Apr 18, 2022
Yangrun Hu, Yuanfan Guo, Fan Zhang, Mingda Wang, Tiancheng Lin, Rong Wu, Yi Xu

Figure 1 for Enhancing Non-mass Breast Ultrasound Cancer Classification With Knowledge Transfer
Figure 2 for Enhancing Non-mass Breast Ultrasound Cancer Classification With Knowledge Transfer
Figure 3 for Enhancing Non-mass Breast Ultrasound Cancer Classification With Knowledge Transfer
Figure 4 for Enhancing Non-mass Breast Ultrasound Cancer Classification With Knowledge Transfer

Much progress has been made in the deep neural network (DNN) based diagnosis of mass lesions breast ultrasound (BUS) images. However, the non-mass lesion is less investigated because of the limited data. Based on the insight that mass data is sufficient and shares the same knowledge structure with non-mass data of identifying the malignancy of a lesion based on the ultrasound image, we propose a novel transfer learning framework to enhance the generalizability of the DNN model for non-mass BUS with the help of mass BUS. Specifically, we train a shared DNN with combined non-mass and mass data. With the prior of different marginal distributions in input and output space, we employ two domain alignment strategies in the proposed transfer learning framework with the insight of capturing domain-specific distribution to address the issue of domain shift. Moreover, we propose a cross-domain semantic-preserve data generation module called CrossMix to recover the missing distribution between non-mass and mass data that is not presented in training data. Experimental results on an in-house dataset demonstrate that the DNN model trained with combined data by our framework achieves a 10% improvement in AUC on the malignancy prediction task of non-mass BUS compared to training directly on non-mass data.

* 4pages. Accepted by ISBI2022 
Viaarxiv icon

Self Supervised Lesion Recognition For Breast Ultrasound Diagnosis

Apr 18, 2022
Yuanfan Guo, Canqian Yang, Tiancheng Lin, Chunxiao Li, Rui Zhang, Yi Xu

Figure 1 for Self Supervised Lesion Recognition For Breast Ultrasound Diagnosis
Figure 2 for Self Supervised Lesion Recognition For Breast Ultrasound Diagnosis
Figure 3 for Self Supervised Lesion Recognition For Breast Ultrasound Diagnosis

Previous deep learning based Computer Aided Diagnosis (CAD) system treats multiple views of the same lesion as independent images. Since an ultrasound image only describes a partial 2D projection of a 3D lesion, such paradigm ignores the semantic relationship between different views of a lesion, which is inconsistent with the traditional diagnosis where sonographers analyze a lesion from at least two views. In this paper, we propose a multi-task framework that complements Benign/Malignant classification task with lesion recognition (LR) which helps leveraging relationship among multiple views of a single lesion to learn a complete representation of the lesion. To be specific, LR task employs contrastive learning to encourage representation that pulls multiple views of the same lesion and repels those of different lesions. The task therefore facilitates a representation that is not only invariant to the view change of the lesion, but also capturing fine-grained features to distinguish between different lesions. Experiments show that the proposed multi-task framework boosts the performance of Benign/Malignant classification as two sub-tasks complement each other and enhance the learned representation of ultrasound images.

* 4pages. Accepted by ISBI2022 
Viaarxiv icon