Alert button
Picture for Dequan Wang

Dequan Wang

Alert button

Text-guided Foundation Model Adaptation for Pathological Image Classification

Jul 27, 2023
Yunkun Zhang, Jin Gao, Mu Zhou, Xiaosong Wang, Yu Qiao, Shaoting Zhang, Dequan Wang

Figure 1 for Text-guided Foundation Model Adaptation for Pathological Image Classification
Figure 2 for Text-guided Foundation Model Adaptation for Pathological Image Classification
Figure 3 for Text-guided Foundation Model Adaptation for Pathological Image Classification
Figure 4 for Text-guided Foundation Model Adaptation for Pathological Image Classification

The recent surge of foundation models in computer vision and natural language processing opens up perspectives in utilizing multi-modal clinical data to train large models with strong generalizability. Yet pathological image datasets often lack biomedical text annotation and enrichment. Guiding data-efficient image diagnosis from the use of biomedical text knowledge becomes a substantial interest. In this paper, we propose to Connect Image and Text Embeddings (CITE) to enhance pathological image classification. CITE injects text insights gained from language models pre-trained with a broad range of biomedical texts, leading to adapt foundation models towards pathological image understanding. Through extensive experiments on the PatchGastric stomach tumor pathological image dataset, we demonstrate that CITE achieves leading performance compared with various baselines especially when training data is scarce. CITE offers insights into leveraging in-domain text knowledge to reinforce data-efficient pathological image classification. Code is available at https://github.com/Yunkun-Zhang/CITE.

* Accepted to MICCAI2023 
Viaarxiv icon

MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

Jun 16, 2023
Dequan Wang, Xiaosong Wang, Lilong Wang, Mengzhang Li, Qian Da, Xiaoqiang Liu, Xiangyu Gao, Jun Shen, Junjun He, Tian Shen, Qi Duan, Jie Zhao, Kang Li, Yu Qiao, Shaoting Zhang

Figure 1 for MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
Figure 2 for MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
Figure 3 for MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
Figure 4 for MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications. Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples, e.g., in-context learning. Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks. In this paper, we aim at approaches adapting the foundation models for medical image classification and present a novel dataset and benchmark for the evaluation, i.e., examining the overall performance of accommodating the large-scale foundation models downstream on a set of diverse real-world clinical tasks. We collect five sets of medical imaging data from multiple institutes targeting a variety of real-world clinical tasks (22,349 images in total), i.e., thoracic diseases screening in X-rays, pathological lesion tissue screening, lesion detection in endoscopy images, neonatal jaundice evaluation, and diabetic retinopathy grading. Results of multiple baseline methods are demonstrated using the proposed dataset from both accuracy and cost-effective perspectives.

* Preprint. Under review 
Viaarxiv icon

Towards General Purpose Medical AI: Continual Learning Medical Foundation Model

Mar 12, 2023
Huahui Yi, Ziyuan Qin, Qicheng Lao, Wei Xu, Zekun Jiang, Dequan Wang, Shaoting Zhang, Kang Li

Figure 1 for Towards General Purpose Medical AI: Continual Learning Medical Foundation Model
Figure 2 for Towards General Purpose Medical AI: Continual Learning Medical Foundation Model
Figure 3 for Towards General Purpose Medical AI: Continual Learning Medical Foundation Model
Figure 4 for Towards General Purpose Medical AI: Continual Learning Medical Foundation Model

Inevitable domain and task discrepancies in real-world scenarios can impair the generalization performance of the pre-trained deep models for medical data. Therefore, we audaciously propose that we should build a general-purpose medical AI system that can be seamlessly adapted to downstream domains/tasks. Since the domain/task adaption procedures usually involve additional labeling work for the target data, designing a data-efficient adaption algorithm is desired to save the cost of transferring the learned knowledge. Our recent work found that vision-language models (VLMs) are efficient learners with extraordinary cross-domain ability. Therefore, in this work, we further explore the possibility of leveraging pre-trained VLMs as medical foundation models for building general-purpose medical AI, where we thoroughly investigate three machine-learning paradigms, i.e., domain/task-specialized learning, joint learning, and continual learning, for training the VLMs and evaluate their generalization performance on cross-domain and cross-task test sets. To alleviate the catastrophic forgetting during sequential training, we employ rehearsal learning and receive a sharp boost in terms of generalization capability. In a nutshell, our empirical evidence suggests that continual learning may be a practical and efficient learning paradigm for the medical foundation model. And we hope researchers can use our empirical evidence as basement to further explore the path toward medical foundation model.

Viaarxiv icon

MAROAM: Map-based Radar SLAM through Two-step Feature Selection

Oct 25, 2022
Dequan Wang, Yifan Duan, Xiaoran Fan, Chengzhen Meng, Jianmin Ji, Yanyong Zhang

Figure 1 for MAROAM: Map-based Radar SLAM through Two-step Feature Selection
Figure 2 for MAROAM: Map-based Radar SLAM through Two-step Feature Selection
Figure 3 for MAROAM: Map-based Radar SLAM through Two-step Feature Selection
Figure 4 for MAROAM: Map-based Radar SLAM through Two-step Feature Selection

In this letter, we propose MAROAM, a millimeter wave radar-based SLAM framework, which employs a two-step feature selection process to build the global consistent map. Specifically, we first extract feature points from raw data based on their local geometric properties to filter out those points that violate the principle of millimeter-wave radar imaging. Then, we further employ another round of probabilistic feature selection by examining how often and how recent the feature point has been detected in the proceeding frames. With such a two-step feature selection, we establish a global consistent map for accurate and robust pose estimation as well as other downstream tasks. At last, we perform loop closure and graph optimization in the back-end, further reducing the accumulated drift error. We evaluate the performance of MAROAM on the three datasets: the Oxford Radar RobotCar Dataset, the MulRan Dataset and the Boreas Dataset. We consider a variety of experimental settings with different scenery, weather, and road conditions. The experimental results show that the accuracy of MAROAM is 7.95%, 37.0% and 8.9% higher than the currently best-performing algorithms on these three datasets, respectively. The ablation results also show that our map-based odometry performs 28.6% better than the commonly used scan-to-frames method. Finally, as devoted contributors to the open-source community, we will open source the algorithm after the paper is accepted.

Viaarxiv icon

Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset

Sep 22, 2022
Fangyu Wu, Dequan Wang, Minjune Hwang, Chenhui Hao, Jiawei Lu, Jiamu Zhang, Christopher Chou, Trevor Darrell, Alexandre Bayen

Figure 1 for Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset
Figure 2 for Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset
Figure 3 for Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset
Figure 4 for Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset

Decentralized multiagent planning has been an important field of research in robotics. An interesting and impactful application in the field is decentralized vehicle coordination in understructured road environments. For example, in an intersection, it is useful yet difficult to deconflict multiple vehicles of intersecting paths in absence of a central coordinator. We learn from common sense that, for a vehicle to navigate through such understructured environments, the driver must understand and conform to the implicit "social etiquette" observed by nearby drivers. To study this implicit driving protocol, we collect the Berkeley DeepDrive Drone dataset. The dataset contains 1) a set of aerial videos recording understructured driving, 2) a collection of images and annotations to train vehicle detection models, and 3) a kit of development scripts for illustrating typical usages. We believe that the dataset is of primary interest for studying decentralized multiagent planning employed by human drivers and, of secondary interest, for computer vision in remote sensing settings.

* 6 pages, 10 figures, 1 table 
Viaarxiv icon

Back to the Source: Diffusion-Driven Test-Time Adaptation

Jul 07, 2022
Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang

Figure 1 for Back to the Source: Diffusion-Driven Test-Time Adaptation
Figure 2 for Back to the Source: Diffusion-Driven Test-Time Adaptation
Figure 3 for Back to the Source: Diffusion-Driven Test-Time Adaptation
Figure 4 for Back to the Source: Diffusion-Driven Test-Time Adaptation

Test-time adaptation harnesses test inputs to improve the accuracy of a model trained on source data when tested on shifted target data. Existing methods update the source model by (re-)training on each target domain. While effective, re-training is sensitive to the amount and order of the data and the hyperparameters for optimization. We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model. Our diffusion-driven adaptation method, DDA, shares its models for classification and generation across all domains. Both models are trained on the source domain, then fixed during testing. We augment diffusion with image guidance and self-ensembling to automatically decide how much to adapt. Input adaptation by DDA is more robust than prior model adaptation approaches across a variety of corruptions, architectures, and data regimes on the ImageNet-C benchmark. With its input-wise updates, DDA succeeds where model adaptation degrades on too little data in small batches, dependent data in non-uniform order, or mixed data with multiple corruptions.

Viaarxiv icon

GACT: Activation Compressed Training for General Architectures

Jun 28, 2022
Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung

Figure 1 for GACT: Activation Compressed Training for General Architectures
Figure 2 for GACT: Activation Compressed Training for General Architectures
Figure 3 for GACT: Activation Compressed Training for General Architectures
Figure 4 for GACT: Activation Compressed Training for General Architectures

Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT's approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with negligible accuracy loss.

Viaarxiv icon

Contrastive Test-Time Adaptation

Apr 21, 2022
Dian Chen, Dequan Wang, Trevor Darrell, Sayna Ebrahimi

Figure 1 for Contrastive Test-Time Adaptation
Figure 2 for Contrastive Test-Time Adaptation
Figure 3 for Contrastive Test-Time Adaptation
Figure 4 for Contrastive Test-Time Adaptation

Test-time adaptation is a special setting of unsupervised domain adaptation where a trained model on the source domain has to adapt to the target domain without accessing source data. We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels. The contrastive learning task is applied jointly with pseudo labeling, contrasting positive and negative pairs constructed similarly as MoCo but with source-initialized encoder, and excluding same-class negative pairs indicated by pseudo labels. Meanwhile, we produce pseudo labels online and refine them via soft voting among their nearest neighbors in the target feature space, enabled by maintaining a memory queue. Our method, AdaContrast, achieves state-of-the-art performance on major benchmarks while having several desirable properties compared to existing works, including memory efficiency, insensitivity to hyper-parameters, and better model calibration. Project page: sites.google.com/view/adacontrast.

* CVPR 2022 camera-ready version 
Viaarxiv icon

On-target Adaptation

Sep 02, 2021
Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell

Figure 1 for On-target Adaptation
Figure 2 for On-target Adaptation
Figure 3 for On-target Adaptation
Figure 4 for On-target Adaptation

Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain. Most adaptation methods rely on the source data by joint optimization over source data and target data. Source-free methods replace the source data with a source model by fine-tuning it on target. Either way, the majority of the parameter updates for the model representation and the classifier are derived from the source, and not the target. However, target accuracy is the goal, and so we argue for optimizing as much as possible on the target data. We show significant improvement by on-target adaptation, which learns the representation purely from target data while taking only the source predictions for supervision. In the long-tailed classification setting, we show further improvement by on-target class distribution learning, which learns the (im)balance of classes from target data.

Viaarxiv icon