Alert button
Picture for Yongchao Xu

Yongchao Xu

Alert button

Scale-aware Test-time Click Adaptation for Pulmonary Nodule and Mass Segmentation

Jul 28, 2023
Zhihao Li, Jiancheng Yang, Yongchao Xu, Li Zhang, Wenhui Dong, Bo Du

Figure 1 for Scale-aware Test-time Click Adaptation for Pulmonary Nodule and Mass Segmentation
Figure 2 for Scale-aware Test-time Click Adaptation for Pulmonary Nodule and Mass Segmentation
Figure 3 for Scale-aware Test-time Click Adaptation for Pulmonary Nodule and Mass Segmentation
Figure 4 for Scale-aware Test-time Click Adaptation for Pulmonary Nodule and Mass Segmentation

Pulmonary nodules and masses are crucial imaging features in lung cancer screening that require careful management in clinical diagnosis. Despite the success of deep learning-based medical image segmentation, the robust performance on various sizes of lesions of nodule and mass is still challenging. In this paper, we propose a multi-scale neural network with scale-aware test-time adaptation to address this challenge. Specifically, we introduce an adaptive Scale-aware Test-time Click Adaptation method based on effortlessly obtainable lesion clicks as test-time cues to enhance segmentation performance, particularly for large lesions. The proposed method can be seamlessly integrated into existing networks. Extensive experiments on both open-source and in-house datasets consistently demonstrate the effectiveness of the proposed method over some CNN and Transformer-based segmentation methods. Our code is available at https://github.com/SplinterLi/SaTTCA

* 11 pages, 3 figures, MICCAI 2023 
Viaarxiv icon

Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation

May 15, 2023
Xin Xiao, Daiguo Zhou, Jiagao Hu, Yi Hu, Yongchao Xu

Figure 1 for Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation
Figure 2 for Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation
Figure 3 for Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation
Figure 4 for Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation

Semantic segmentation has recently witnessed great progress. Despite the impressive overall results, the segmentation performance in some hard areas (e.g., small objects or thin parts) is still not promising. A straightforward solution is hard sample mining, which is widely used in object detection. Yet, most existing hard pixel mining strategies for semantic segmentation often rely on pixel's loss value, which tends to decrease during training. Intuitively, the pixel hardness for segmentation mainly depends on image structure and is expected to be stable. In this paper, we propose to learn pixel hardness for semantic segmentation, leveraging hardness information contained in global and historical loss values. More precisely, we add a gradient-independent branch for learning a hardness level (HL) map by maximizing hardness-weighted segmentation loss, which is minimized for the segmentation head. This encourages large hardness values in difficult areas, leading to appropriate and stable HL map. Despite its simplicity, the proposed method can be applied to most segmentation methods with no and marginal extra cost during inference and training, respectively. Without bells and whistles, the proposed method achieves consistent/significant improvement (1.37% mIoU on average) over most popular semantic segmentation methods on Cityscapes dataset, and demonstrates good generalization ability across domains. The source codes are available at https://github.com/Menoly-xin/Hardness-Level-Learning .

Viaarxiv icon

Local Intensity Order Transformation for Robust Curvilinear Object Segmentation

Feb 25, 2022
Tianyi Shi, Nicolas Boutry, Yongchao Xu, Thierry Géraud

Figure 1 for Local Intensity Order Transformation for Robust Curvilinear Object Segmentation
Figure 2 for Local Intensity Order Transformation for Robust Curvilinear Object Segmentation
Figure 3 for Local Intensity Order Transformation for Robust Curvilinear Object Segmentation
Figure 4 for Local Intensity Order Transformation for Robust Curvilinear Object Segmentation

Segmentation of curvilinear structures is important in many applications, such as retinal blood vessel segmentation for early detection of vessel diseases and pavement crack segmentation for road condition evaluation and maintenance. Currently, deep learning-based methods have achieved impressive performance on these tasks. Yet, most of them mainly focus on finding powerful deep architectures but ignore capturing the inherent curvilinear structure feature (e.g., the curvilinear structure is darker than the context) for a more robust representation. In consequence, the performance usually drops a lot on cross-datasets, which poses great challenges in practice. In this paper, we aim to improve the generalizability by introducing a novel local intensity order transformation (LIOT). Specifically, we transfer a gray-scale image into a contrast-invariant four-channel image based on the intensity order between each pixel and its nearby pixels along with the four (horizontal and vertical) directions. This results in a representation that preserves the inherent characteristic of the curvilinear structure while being robust to contrast changes. Cross-dataset evaluation on three retinal blood vessel segmentation datasets demonstrates that LIOT improves the generalizability of some state-of-the-art methods. Additionally, the cross-dataset evaluation between retinal blood vessel segmentation and pavement crack segmentation shows that LIOT is able to preserve the inherent characteristic of curvilinear structure with large appearance gaps. An implementation of the proposed method is available at https://github.com/TY-Shi/LIOT.

* Accepted by IEEE TIP 
Viaarxiv icon

Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence

Nov 18, 2021
Xiang Bai, Hanchen Wang, Liya Ma, Yongchao Xu, Jiefeng Gan, Ziwei Fan, Fan Yang, Ke Ma, Jiehua Yang, Song Bai, Chang Shu, Xinyu Zou, Renhao Huang, Changzheng Zhang, Xiaowu Liu, Dandan Tu, Chuou Xu, Wenqing Zhang, Xi Wang, Anguo Chen, Yu Zeng, Dehua Yang, Ming-Wei Wang, Nagaraj Holalkere, Neil J. Halin, Ihab R. Kamel, Jia Wu, Xuehua Peng, Xiang Wang, Jianbo Shao, Pattanasak Mongkolwat, Jianjun Zhang, Weiyang Liu, Michael Roberts, Zhongzhao Teng, Lucian Beer, Lorena Escudero Sanchez, Evis Sala, Daniel Rubin, Adrian Weller, Joan Lasenby, Chuangsheng Zheng, Jianming Wang, Zhen Li, Carola-Bibiane Schönlieb, Tian Xia

Figure 1 for Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence
Figure 2 for Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence
Figure 3 for Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence
Figure 4 for Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence

Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses. However, concerns surrounding security and trustworthiness impede the collection of large-scale representative medical data, posing a considerable challenge for training a well-generalised model in clinical practices. To address this, we launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution under a federated learning framework (FL) without data sharing. Here we show that our FL model outperformed all the local models by a large yield (test sensitivity /specificity in China: 0.973/0.951, in the UK: 0.730/0.942), achieving comparable performance with a panel of professional radiologists. We further evaluated the model on the hold-out (collected from another two hospitals leaving out the FL) and heterogeneous (acquired with contrast materials) data, provided visual explanations for decisions made by the model, and analysed the trade-offs between the model performance and the communication costs in the federated training process. Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK. Collectively, our work advanced the prospects of utilising federated learning for privacy-preserving AI in digital health.

* Nature Machine Intelligence 
Viaarxiv icon

VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Jul 19, 2021
Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, Qinghua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao

Figure 1 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results
Figure 2 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results
Figure 3 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results
Figure 4 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint. However, there are few algorithms focusing on crowd counting on the drone-captured data due to the lack of comprehensive datasets. To this end, we collect a large-scale dataset and organize the Vision Meets Drone Crowd Counting Challenge (VisDrone-CC2020) in conjunction with the 16th European Conference on Computer Vision (ECCV 2020) to promote the developments in the related fields. The collected dataset is formed by $3,360$ images, including $2,460$ images for training, and $900$ images for testing. Specifically, we manually annotate persons with points in each video frame. There are $14$ algorithms from $15$ institutes submitted to the VisDrone-CC2020 Challenge. We provide a detailed analysis of the evaluation results and conclude the challenge. More information can be found at the website: \url{http://www.aiskyeye.com/}.

* European Conference on Computer Vision. Springer, Cham, 2020: 675-691  
* The method description of A7 Mutil-Scale Aware based SFANet (M-SFANet) is updated and missing references are added 
Viaarxiv icon

Affinity Space Adaptation for Semantic Segmentation Across Domains

Sep 26, 2020
Wei Zhou, Yukang Wang, Jiajia Chu, Jiehua Yang, Xiang Bai, Yongchao Xu

Figure 1 for Affinity Space Adaptation for Semantic Segmentation Across Domains
Figure 2 for Affinity Space Adaptation for Semantic Segmentation Across Domains
Figure 3 for Affinity Space Adaptation for Semantic Segmentation Across Domains
Figure 4 for Affinity Space Adaptation for Semantic Segmentation Across Domains

Semantic segmentation with dense pixel-wise annotation has achieved excellent performance thanks to deep learning. However, the generalization of semantic segmentation in the wild remains challenging. In this paper, we address the problem of unsupervised domain adaptation (UDA) in semantic segmentation. Motivated by the fact that source and target domain have invariant semantic structures, we propose to exploit such invariance across domains by leveraging co-occurring patterns between pairwise pixels in the output of structured semantic segmentation. This is different from most existing approaches that attempt to adapt domains based on individual pixel-wise information in image, feature, or output level. Specifically, we perform domain adaptation on the affinity relationship between adjacent pixels termed affinity space of source and target domain. To this end, we develop two affinity space adaptation strategies: affinity space cleaning and adversarial affinity space alignment. Extensive experiments demonstrate that the proposed method achieves superior performance against some state-of-the-art methods on several challenging benchmarks for semantic segmentation across domains. The code is available at https://github.com/idealwei/ASANet.

* Accepted by IEEE TIP 
Viaarxiv icon

Learning Directional Feature Maps for Cardiac MRI Segmentation

Jul 22, 2020
Feng Cheng, Cheng Chen, Yukang Wang, Heshui Shi, Yukun Cao, Dandan Tu, Changzheng Zhang, Yongchao Xu

Figure 1 for Learning Directional Feature Maps for Cardiac MRI Segmentation
Figure 2 for Learning Directional Feature Maps for Cardiac MRI Segmentation
Figure 3 for Learning Directional Feature Maps for Cardiac MRI Segmentation
Figure 4 for Learning Directional Feature Maps for Cardiac MRI Segmentation

Cardiac MRI segmentation plays a crucial role in clinical diagnosis for evaluating personalized cardiac performance parameters. Due to the indistinct boundaries and heterogeneous intensity distributions in the cardiac MRI, most existing methods still suffer from two aspects of challenges: inter-class indistinction and intra-class inconsistency. To tackle these two problems, we propose a novel method to exploit the directional feature maps, which can simultaneously strengthen the differences between classes and the similarities within classes. Specifically, we perform cardiac segmentation and learn a direction field pointing away from the nearest cardiac tissue boundary to each pixel via a direction field (DF) module. Based on the learned direction field, we then propose a feature rectification and fusion (FRF) module to improve the original segmentation features, and obtain the final segmentation. The proposed modules are simple yet effective and can be flexibly added to any existing segmentation network without excessively increasing time and space complexity. We evaluate the proposed method on the 2017 MICCAI Automated Cardiac Diagnosis Challenge (ACDC) dataset and a large-scale self-collected dataset, showing good segmentation performance and robust generalization ability of the proposed method.

* Accepted by MICCAI2020 
Viaarxiv icon

Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation

May 30, 2020
Jianqiang Wan, Yang Liu, Donglai Wei, Xiang Bai, Yongchao Xu

Figure 1 for Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation
Figure 2 for Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation
Figure 3 for Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation
Figure 4 for Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation

Image segmentation is a fundamental vision task and a crucial step for many applications. In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD. Precisely, we define BPD on each pixel as a two-dimensional unit vector pointing from its nearest boundary to the pixel. In the BPD, nearby pixels from different regions have opposite directions departing from each other, and adjacent pixels in the same region have directions pointing to the other or each other (i.e., around medial points). We make use of such property to partition an image into super-BPDs, which are novel informative superpixels with robust direction similarity for fast grouping into segmentation regions. Extensive experimental results on BSDS500 and Pascal Context demonstrate the accuracy and efficency of the proposed super-BPD in segmenting images. In practice, the proposed super-BPD achieves comparable or superior performance with MCG while running at ~25fps vs. 0.07fps. Super-BPD also exhibits a noteworthy transferability to unseen scenes. The code is publicly available at https://github.com/JianqiangWan/Super-BPD.

* Accepted to CVPR 2020. 10 pages, 9 figures. Code available at https: //github.com/JianqiangWan/Super-BPD 
Viaarxiv icon

Efficient Backbone Search for Scene Text Recognition

Mar 14, 2020
Hui Zhang, Quanming Yao, Mingkun Yang, Yongchao Xu, Xiang Bai

Figure 1 for Efficient Backbone Search for Scene Text Recognition
Figure 2 for Efficient Backbone Search for Scene Text Recognition
Figure 3 for Efficient Backbone Search for Scene Text Recognition
Figure 4 for Efficient Backbone Search for Scene Text Recognition

Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes. The community has paid increasing attention to boost the performance by improving the pre-processing image module, like rectification and deblurring, or the sequence translator. However, another critical module, i.e., the feature sequence extractor, has not been extensively explored. In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance. First, we design a domain-specific search space for STR, which contains both choices on operations and constraints on the downsampling path. Then, we propose a two-step search algorithm, which decouples operations and downsampling path, for an efficient search in the given space. Experiments demonstrate that, by searching data-dependent backbones, AutoSTR can outperform the state-of-the-art approaches on standard benchmarks with much fewer FLOPS and model parameters.

Viaarxiv icon

AutoScale: Learning to Scale for Crowd Counting

Dec 20, 2019
Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Masayoshi Tomizuka, Xiang Bai

Figure 1 for AutoScale: Learning to Scale for Crowd Counting
Figure 2 for AutoScale: Learning to Scale for Crowd Counting
Figure 3 for AutoScale: Learning to Scale for Crowd Counting
Figure 4 for AutoScale: Learning to Scale for Crowd Counting

Crowd counting in images is a widely explored but challenging task. Though recent convolutional neural network (CNN) methods have achieved great progress, it is still difficult to accurately count and even to precisely localize people in very dense regions. A major issue is that dense regions usually consist of many instances of small size, and thus exhibit very different density patterns compared with sparse regions. Localizing or detecting dense small objects is also very delicate. In this paper, instead of processing image pyramid and aggregating multi-scale features, we propose a simple yet effective Learning to Scale (L2S) module to cope with significant scale variations in both regression and localization. Specifically, L2S module aims to automatically scale dense regions into similar and reasonable scale levels. This alleviates the density pattern shift for density regression methods and facilitates the localization of small instances. Besides, we also introduce a novel distance label map combined with a customized adapted cross-entropy loss for precise person localization. Extensive experiments demonstrate that the proposed method termed AutoScale consistently improves upon state-of-the-art methods in both regression and localization benchmarks on three widely used datasets. The proposed AutoScale also demonstrates a noteworthy transferability under cross-dataset validation on different datasets.

* CV 
Viaarxiv icon