Alert button
Picture for Yongjian Wu

Yongjian Wu

Alert button

Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Jun 30, 2023
Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

Figure 1 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
Figure 2 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
Figure 3 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
Figure 4 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Large-scale visual-language pre-trained models (VLPM) have proven their excellent performance in downstream object detection for natural scenes. However, zero-shot nuclei detection on H\&E images via VLPMs remains underexplored. The large gap between medical images and the web-originated text-image pairs used for pre-training makes it a challenging task. In this paper, we attempt to explore the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP) model, for zero-shot nuclei detection. Concretely, an automatic prompts design pipeline is devised based on the association binding trait of VLPM and the image-to-text VLPM BLIP, avoiding empirical manual prompts engineering. We further establish a self-training framework, using the automatically designed prompts to generate the preliminary results as pseudo labels from GLIP and refine the predicted boxes in an iterative manner. Our method achieves a remarkable performance for label-free nuclei detection, surpassing other comparison methods. Foremost, our work demonstrates that the VLPM pre-trained on natural image-text pairs exhibits astonishing potential for downstream tasks in the medical field as well. Code will be released at https://github.com/wuyongjianCODE/VLPMNuD.

* This article has been accepted by MICCAI 2023,but has not been fully edited. Content may change prior to final publication 
Viaarxiv icon

Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Jun 05, 2023
Yang Zhou, Yongjian Wu, Zihua Wang, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

Figure 1 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation
Figure 2 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation
Figure 3 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation
Figure 4 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Nuclei instance segmentation on histopathology images is of great clinical value for disease analysis. Generally, fully-supervised algorithms for this task require pixel-wise manual annotations, which is especially time-consuming and laborious for the high nuclei density. To alleviate the annotation burden, we seek to solve the problem through image-level weakly supervised learning, which is underexplored for nuclei instance segmentation. Compared with most existing methods using other weak annotations (scribble, point, etc.) for nuclei instance segmentation, our method is more labor-saving. The obstacle to using image-level annotations in nuclei instance segmentation is the lack of adequate location information, leading to severe nuclei omission or overlaps. In this paper, we propose a novel image-level weakly supervised method, called cyclic learning, to solve this problem. Cyclic learning comprises a front-end classification task and a back-end semi-supervised instance segmentation task to benefit from multi-task learning (MTL). We utilize a deep learning classifier with interpretability as the front-end to convert image-level labels to sets of high-confidence pseudo masks and establish a semi-supervised architecture as the back-end to conduct nuclei instance segmentation under the supervision of these pseudo masks. Most importantly, cyclic learning is designed to circularly share knowledge between the front-end classifier and the back-end semi-supervised part, which allows the whole system to fully extract the underlying information from image-level labels and converge to a better optimum. Experiments on three datasets demonstrate the good generality of our method, which outperforms other image-level weakly supervised methods for nuclei instance segmentation, and achieves comparable performance to fully-supervised methods.

* This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI https://doi.org/10.1109/TMI.2023.3275609, IEEE Transactions on Medical Imaging. Code: https://github.com/wuyongjianCODE/Cyclic 
Viaarxiv icon

Latent Feature Relation Consistency for Adversarial Robustness

Mar 29, 2023
Xingbin Liu, Huafeng Kuang, Hong Liu, Xianming Lin, Yongjian Wu, Rongrong Ji

Figure 1 for Latent Feature Relation Consistency for Adversarial Robustness
Figure 2 for Latent Feature Relation Consistency for Adversarial Robustness
Figure 3 for Latent Feature Relation Consistency for Adversarial Robustness
Figure 4 for Latent Feature Relation Consistency for Adversarial Robustness

Deep neural networks have been applied in many computer vision tasks and achieved state-of-the-art performance. However, misclassification will occur when DNN predicts adversarial examples which add human-imperceptible adversarial noise to natural examples. This limits the application of DNN in security-critical fields. To alleviate this problem, we first conducted an empirical analysis of the latent features of both adversarial and natural examples and found the similarity matrix of natural examples is more compact than those of adversarial examples. Motivated by this observation, we propose \textbf{L}atent \textbf{F}eature \textbf{R}elation \textbf{C}onsistency (\textbf{LFRC}), which constrains the relation of adversarial examples in latent space to be consistent with the natural examples. Importantly, our LFRC is orthogonal to the previous method and can be easily combined with them to achieve further improvement. To demonstrate the effectiveness of LFRC, we conduct extensive experiments using different neural networks on benchmark datasets. For instance, LFRC can bring 0.78\% further improvement compared to AT, and 1.09\% improvement compared to TRADES, against AutoAttack on CIFAR10. Code is available at https://github.com/liuxingbin/LFRC.

* Tech report 
Viaarxiv icon

CAT:Collaborative Adversarial Training

Mar 27, 2023
Xingbin Liu, Huafeng Kuang, Xianming Lin, Yongjian Wu, Rongrong Ji

Figure 1 for CAT:Collaborative Adversarial Training
Figure 2 for CAT:Collaborative Adversarial Training
Figure 3 for CAT:Collaborative Adversarial Training
Figure 4 for CAT:Collaborative Adversarial Training

Adversarial training can improve the robustness of neural networks. Previous methods focus on a single adversarial training strategy and do not consider the model property trained by different strategies. By revisiting the previous methods, we find different adversarial training methods have distinct robustness for sample instances. For example, a sample instance can be correctly classified by a model trained using standard adversarial training (AT) but not by a model trained using TRADES, and vice versa. Based on this observation, we propose a collaborative adversarial training framework to improve the robustness of neural networks. Specifically, we use different adversarial training methods to train robust models and let models interact with their knowledge during the training process. Collaborative Adversarial Training (CAT) can improve both robustness and accuracy. Extensive experiments on various networks and datasets validate the effectiveness of our method. CAT achieves state-of-the-art adversarial robustness without using any additional data on CIFAR-10 under the Auto-Attack benchmark. Code is available at https://github.com/liuxingbin/CAT.

* Tech report 
Viaarxiv icon

Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-Identification

Mar 20, 2023
Jiaer Xia, Lei Tan, Pingyang Dai, Mingbo Zhao, Yongjian Wu, Rongrong Ji

Figure 1 for Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-Identification
Figure 2 for Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-Identification
Figure 3 for Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-Identification
Figure 4 for Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-Identification

Occluded person re-identification (Re-ID) aims to address the potential occlusion problem when matching occluded or holistic pedestrians from different camera views. Many methods use the background as artificial occlusion and rely on attention networks to exclude noisy interference. However, the significant discrepancy between simple background occlusion and realistic occlusion can negatively impact the generalization of the network.To address this issue, we propose a novel transformer-based Attention Disturbance and Dual-Path Constraint Network (ADP) to enhance the generalization of attention networks. Firstly, to imitate real-world obstacles, we introduce an Attention Disturbance Mask (ADM) module that generates an offensive noise, which can distract attention like a realistic occluder, as a more complex form of occlusion.Secondly, to fully exploit these complex occluded images, we develop a Dual-Path Constraint Module (DPC) that can obtain preferable supervision information from holistic images through dual-path interaction. With our proposed method, the network can effectively circumvent a wide variety of occlusions using the basic ViT baseline. Comprehensive experimental evaluations conducted on person re-ID benchmarks demonstrate the superiority of ADP over state-of-the-art methods.

* 10 pages, 4 figures 
Viaarxiv icon

Spectral Aware Softmax for Visible-Infrared Person Re-Identification

Feb 03, 2023
Lei Tan, Pingyang Dai, Qixiang Ye, Mingliang Xu, Yongjian Wu, Rongrong Ji

Figure 1 for Spectral Aware Softmax for Visible-Infrared Person Re-Identification
Figure 2 for Spectral Aware Softmax for Visible-Infrared Person Re-Identification
Figure 3 for Spectral Aware Softmax for Visible-Infrared Person Re-Identification
Figure 4 for Spectral Aware Softmax for Visible-Infrared Person Re-Identification

Visible-infrared person re-identification (VI-ReID) aims to match specific pedestrian images from different modalities. Although suffering an extra modality discrepancy, existing methods still follow the softmax loss training paradigm, which is widely used in single-modality classification tasks. The softmax loss lacks an explicit penalty for the apparent modality gap, which adversely limits the performance upper bound of the VI-ReID task. In this paper, we propose the spectral-aware softmax (SA-Softmax) loss, which can fully explore the embedding space with the modality information and has clear interpretability. Specifically, SA-Softmax loss utilizes an asynchronous optimization strategy based on the modality prototype instead of the synchronous optimization based on the identity prototype in the original softmax loss. To encourage a high overlapping between two modalities, SA-Softmax optimizes each sample by the prototype from another spectrum. Based on the observation and analysis of SA-Softmax, we modify the SA-Softmax with the Feature Mask and Absolute-Similarity Term to alleviate the ambiguous optimization during model training. Extensive experimental evaluations conducted on RegDB and SYSU-MM01 demonstrate the superior performance of the SA-Softmax over the state-of-the-art methods in such a cross-modality condition.

Viaarxiv icon

Exploring Invariant Representation for Visible-Infrared Person Re-Identification

Feb 02, 2023
Lei Tan, Yukang Zhang, Shengmei Shen, Yan Wang, Pingyang Dai, Xianming Lin, Yongjian Wu, Rongrong Ji

Figure 1 for Exploring Invariant Representation for Visible-Infrared Person Re-Identification
Figure 2 for Exploring Invariant Representation for Visible-Infrared Person Re-Identification
Figure 3 for Exploring Invariant Representation for Visible-Infrared Person Re-Identification
Figure 4 for Exploring Invariant Representation for Visible-Infrared Person Re-Identification

Cross-spectral person re-identification, which aims to associate identities to pedestrians across different spectra, faces a main challenge of the modality discrepancy. In this paper, we address the problem from both image-level and feature-level in an end-to-end hybrid learning framework named robust feature mining network (RFM). In particular, we observe that the reflective intensity of the same surface in photos shot in different wavelengths could be transformed using a linear model. Besides, we show the variable linear factor across the different surfaces is the main culprit which initiates the modality discrepancy. We integrate such a reflection observation into an image-level data augmentation by proposing the linear transformation generator (LTG). Moreover, at the feature level, we introduce a cross-center loss to explore a more compact intra-class distribution and modality-aware spatial attention to take advantage of textured regions more efficiently. Experiment results on two standard cross-spectral person re-identification datasets, i.e., RegDB and SYSU-MM01, have demonstrated state-of-the-art performance.

Viaarxiv icon

Unsupervised Domain Adaptation on Person Re-Identification via Dual-level Asymmetric Mutual Learning

Jan 29, 2023
Qiong Wu, Jiahan Li, Pingyang Dai, Qixiang Ye, Liujuan Cao, Yongjian Wu, Rongrong Ji

Figure 1 for Unsupervised Domain Adaptation on Person Re-Identification via Dual-level Asymmetric Mutual Learning
Figure 2 for Unsupervised Domain Adaptation on Person Re-Identification via Dual-level Asymmetric Mutual Learning
Figure 3 for Unsupervised Domain Adaptation on Person Re-Identification via Dual-level Asymmetric Mutual Learning
Figure 4 for Unsupervised Domain Adaptation on Person Re-Identification via Dual-level Asymmetric Mutual Learning

Unsupervised domain adaptation person re-identification (Re-ID) aims to identify pedestrian images within an unlabeled target domain with an auxiliary labeled source-domain dataset. Many existing works attempt to recover reliable identity information by considering multiple homogeneous networks. And take these generated labels to train the model in the target domain. However, these homogeneous networks identify people in approximate subspaces and equally exchange their knowledge with others or their mean net to improve their ability, inevitably limiting the scope of available knowledge and putting them into the same mistake. This paper proposes a Dual-level Asymmetric Mutual Learning method (DAML) to learn discriminative representations from a broader knowledge scope with diverse embedding spaces. Specifically, two heterogeneous networks mutually learn knowledge from asymmetric subspaces through the pseudo label generation in a hard distillation manner. The knowledge transfer between two networks is based on an asymmetric mutual learning manner. The teacher network learns to identify both the target and source domain while adapting to the target domain distribution based on the knowledge of the student. Meanwhile, the student network is trained on the target dataset and employs the ground-truth label through the knowledge of the teacher. Extensive experiments in Market-1501, CUHK-SYSU, and MSMT17 public datasets verified the superiority of DAML over state-of-the-arts.

Viaarxiv icon

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network

Jan 09, 2023
Haowei Wang, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Xiaoshuai Sun

Figure 1 for Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
Figure 2 for Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
Figure 3 for Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
Figure 4 for Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network

Panoptic Narrative Grounding (PNG) is an emerging cross-modal grounding task, which locates the target regions of an image corresponding to the text description. Existing approaches for PNG are mainly based on a two-stage paradigm, which is computationally expensive. In this paper, we propose a one-stage network for real-time PNG, termed End-to-End Panoptic Narrative Grounding network (EPNG), which directly generates masks for referents. Specifically, we propose two innovative designs, i.e., Locality-Perceptive Attention (LPA) and a bidirectional Semantic Alignment Loss (SAL), to properly handle the many-to-many relationship between textual expressions and visual objects. LPA embeds the local spatial priors into attention modeling, i.e., a pixel may belong to multiple masks at different scales, thereby improving segmentation. To help understand the complex semantic relationships, SAL proposes a bidirectional contrastive objective to regularize the semantic consistency inter modalities. Extensive experiments on the PNG benchmark dataset demonstrate the effectiveness and efficiency of our method. Compared to the single-stage baseline, our method achieves a significant improvement of up to 9.4% accuracy. More importantly, our EPNG is 10 times faster than the two-stage model. Meanwhile, the generalization ability of EPNG is also validated by zero-shot experiments on other grounding tasks.

* 9 pages, 5 figures, accepted by AAAI23 
Viaarxiv icon

CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification

Aug 21, 2022
Qiong Wu, Jiaer Xia, Pingyang Dai, Yiyi Zhou, Yongjian Wu, Rongrong Ji

Figure 1 for CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification
Figure 2 for CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification
Figure 3 for CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification
Figure 4 for CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification

Visible-infrared person re-identification (VI-ReID) is a task of matching the same individuals across the visible and infrared modalities. Its main challenge lies in the modality gap caused by cameras operating on different spectra. Existing VI-ReID methods mainly focus on learning general features across modalities, often at the expense of feature discriminability. To address this issue, we present a novel cycle-construction-based network for neutral yet discriminative feature learning, termed CycleTrans. Specifically, CycleTrans uses a lightweight Knowledge Capturing Module (KCM) to capture rich semantics from the modality-relevant feature maps according to pseudo queries. Afterwards, a Discrepancy Modeling Module (DMM) is deployed to transform these features into neutral ones according to the modality-irrelevant prototypes. To ensure feature discriminability, another two KCMs are further deployed for feature cycle constructions. With cycle construction, our method can learn effective neutral features for visible and infrared images while preserving their salient semantics. Extensive experiments on SYSU-MM01 and RegDB datasets validate the merits of CycleTrans against a flurry of state-of-the-art methods, +4.57% on rank-1 in SYSU-MM01 and +2.2% on rank-1 in RegDB.

Viaarxiv icon