Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanwu Xu

South China University of Technology

Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

Mar 29, 2022

Yanwu Xu, Shaoan Xie, Wenhao Wu, Kun Zhang, Mingming Gong, Kayhan Batmanghelich

Figure 1 for Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

Figure 2 for Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

Figure 3 for Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

Figure 4 for Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

Abstract:Unpaired image-to-image translation (I2I) is an ill-posed problem, as an infinite number of translation functions can map the source domain distribution to the target distribution. Therefore, much effort has been put into designing suitable constraints, e.g., cycle consistency (CycleGAN), geometry consistency (GCGAN), and contrastive learning-based constraints (CUTGAN), that help better pose the problem. However, these well-known constraints have limitations: (1) they are either too restrictive or too weak for specific I2I tasks; (2) these methods result in content distortion when there is a significant spatial variation between the source and target domains. This paper proposes a universal regularization technique called maximum spatial perturbation consistency (MSPC), which enforces a spatial perturbation function (T ) and the translation operator (G) to be commutative (i.e., TG = GT ). In addition, we introduce two adversarial training components for learning the spatial perturbation function. The first one lets T compete with G to achieve maximum perturbation. The second one lets G and T compete with discriminators to align the spatial variations caused by the change of object size, object distortion, background interruptions, etc. Our method outperforms the state-of-the-art methods on most I2I benchmarks. We also introduce a new benchmark, namely the front face to profile face dataset, to emphasize the underlying challenges of I2I for real-world applications. We finally perform ablation experiments to study the sensitivity of our method to the severity of spatial perturbation and its effectiveness for distribution alignment.

* CVPR 2022 accepted paper

Via

Access Paper or Ask Questions

REFUGE2 Challenge: Treasure for Multi-Domain Learning in Glaucoma Assessment

Feb 24, 2022

Huihui Fang, Fei Li, Huazhu Fu, Xu Sun, Xingxing Cao, Jaemin Son, Shuang Yu, Menglu Zhang, Chenglang Yuan, Cheng Bian(+18 more)

Figure 1 for REFUGE2 Challenge: Treasure for Multi-Domain Learning in Glaucoma Assessment

Figure 2 for REFUGE2 Challenge: Treasure for Multi-Domain Learning in Glaucoma Assessment

Figure 3 for REFUGE2 Challenge: Treasure for Multi-Domain Learning in Glaucoma Assessment

Figure 4 for REFUGE2 Challenge: Treasure for Multi-Domain Learning in Glaucoma Assessment

Abstract:Glaucoma is the second leading cause of blindness and is the leading cause of irreversible blindness disease in the world. Early screening for glaucoma in the population is significant. Color fundus photography is the most cost effective imaging modality to screen for ocular diseases. Deep learning network is often used in color fundus image analysis due to its powful feature extraction capability. However, the model training of deep learning method needs a large amount of data, and the distribution of data should be abundant for the robustness of model performance. To promote the research of deep learning in color fundus photography and help researchers further explore the clinical application signification of AI technology, we held a REFUGE2 challenge. This challenge released 2,000 color fundus images of four models, including Zeiss, Canon, Kowa and Topcon, which can validate the stabilization and generalization of algorithms on multi-domain. Moreover, three sub-tasks were designed in the challenge, including glaucoma classification, cup/optic disc segmentation, and macular fovea localization. These sub-tasks technically cover the three main problems of computer vision and clinicly cover the main researchs of glaucoma diagnosis. Over 1,300 international competitors joined the REFUGE2 challenge, 134 teams submitted more than 3,000 valid preliminary results, and 22 teams reached the final. This article summarizes the methods of some of the finalists and analyzes their results. In particular, we observed that the teams using domain adaptation strategies had high and robust performance on the dataset with multi-domain. This indicates that UDA and other multi-domain related researches will be the trend of deep learning field in the future, and our REFUGE2 datasets will play an important role in these researches.

* 28 pages, 20 figures

Via

Access Paper or Ask Questions

ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Feb 18, 2022

Huihui Fang, Fei Li, Huazhu Fu, Xu Sun, Xingxing Cao, Fengbin Lin, Jaemin Son, Sunho Kim, Gwenole Quellec, Sarah Matta(+18 more)

Figure 1 for ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Figure 2 for ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Figure 3 for ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Figure 4 for ADAM Challenge: Detecting Age-related Macular Degeneration from Fundus Images

Abstract:Age-related macular degeneration (AMD) is the leading cause of visual impairment among elderly in the world. Early detection of AMD is of great importance as the vision loss caused by AMD is irreversible and permanent. Color fundus photography is the most cost-effective imaging modality to screen for retinal disorders. \textcolor{red}{Recently, some algorithms based on deep learning had been developed for fundus image analysis and automatic AMD detection. However, a comprehensive annotated dataset and a standard evaluation benchmark are still missing.} To deal with this issue, we set up the Automatic Detection challenge on Age-related Macular degeneration (ADAM) for the first time, held as a satellite event of the ISBI 2020 conference. The ADAM challenge consisted of four tasks which cover the main topics in detecting AMD from fundus images, including classification of AMD, detection and segmentation of optic disc, localization of fovea, and detection and segmentation of lesions. The ADAM challenge has released a comprehensive dataset of 1200 fundus images with the category labels of AMD, the pixel-wise segmentation masks of the full optic disc and lesions (drusen, exudate, hemorrhage, scar, and other), as well as the location coordinates of the macular fovea. A uniform evaluation framework has been built to make a fair comparison of different models. During the ADAM challenge, 610 results were submitted for online evaluation, and finally, 11 teams participated in the onsite challenge. This paper introduces the challenge, dataset, and evaluation methods, as well as summarizes the methods and analyzes the results of the participating teams of each task. In particular, we observed that ensembling strategy and clinical prior knowledge can better improve the performances of the deep learning models.

* 29 pages, 17 figures

Via

Access Paper or Ask Questions

GAMMA Challenge:Glaucoma grAding from Multi-Modality imAges

Feb 16, 2022

Junde Wu, Huihui Fang, Fei Li, Huazhu Fu, Fengbin Lin, Jiongcheng Li, Lexing Huang, Qinji Yu, Sifan Song, Xingxing Xu(+19 more)

Figure 1 for GAMMA Challenge:Glaucoma grAding from Multi-Modality imAges

Figure 2 for GAMMA Challenge:Glaucoma grAding from Multi-Modality imAges

Figure 3 for GAMMA Challenge:Glaucoma grAding from Multi-Modality imAges

Figure 4 for GAMMA Challenge:Glaucoma grAding from Multi-Modality imAges

Abstract:Color fundus photography and Optical Coherence Tomography (OCT) are the two most cost-effective tools for glaucoma screening. Both two modalities of images have prominent biomarkers to indicate glaucoma suspected. Clinically, it is often recommended to take both of the screenings for a more accurate and reliable diagnosis. However, although numerous algorithms are proposed based on fundus images or OCT volumes in computer-aided diagnosis, there are still few methods leveraging both of the modalities for the glaucoma assessment. Inspired by the success of Retinal Fundus Glaucoma Challenge (REFUGE) we held previously, we set up the Glaucoma grAding from Multi-Modality imAges (GAMMA) Challenge to encourage the development of fundus \& OCT-based glaucoma grading. The primary task of the challenge is to grade glaucoma from both the 2D fundus images and 3D OCT scanning volumes. As part of GAMMA, we have publicly released a glaucoma annotated dataset with both 2D fundus color photography and 3D OCT volumes, which is the first multi-modality dataset for glaucoma grading. In addition, an evaluation framework is also established to evaluate the performance of the submitted methods. During the challenge, 1272 results were submitted, and finally, top-10 teams were selected to the final stage. We analysis their results and summarize their methods in the paper. Since all these teams submitted their source code in the challenge, a detailed ablation study is also conducted to verify the effectiveness of the particular modules proposed. We find many of the proposed techniques are practical for the clinical diagnosis of glaucoma. As the first in-depth study of fundus \& OCT multi-modality glaucoma grading, we believe the GAMMA Challenge will be an essential starting point for future research.

Via

Access Paper or Ask Questions

Opinions Vary? Diagnosis First!

Feb 14, 2022

Junde Wu, Huihui Fang, Binghong Wu, Dalu Yang, Yehui Yang, Yanwu Xu

Figure 1 for Opinions Vary? Diagnosis First!

Figure 2 for Opinions Vary? Diagnosis First!

Figure 3 for Opinions Vary? Diagnosis First!

Figure 4 for Opinions Vary? Diagnosis First!

Abstract:In medical image segmentation, images are usually annotated by several different clinical experts. This clinical routine helps to mitigate the personal bias. However, Computer Vision models often assume there has a unique ground-truth for each of the instance. This research gap between Computer Vision and medical routine is commonly existed but less explored by the current research.In this paper, we try to answer the following two questions: 1. How to learn an optimal combination of the multiple segmentation labels? and 2. How to estimate this segmentation mask from the raw image? We note that in clinical practice, the image segmentation mask usually exists as an auxiliary information for disease diagnosis. Adhering to this mindset, we propose a framework taking the diagnosis result as the gold standard, to estimate the segmentation mask upon the multi-rater segmentation labels, named DiFF (Diagnosis First segmentation Framework).DiFF is implemented by two novelty techniques. First, DFSim (Diagnosis First Simulation of gold label) is learned as an optimal combination of multi-rater segmentation labels for the disease diagnosis. Then, toward estimating DFSim mask from the raw image, we further propose T\&G Module (Take and Give Module) to instill the diagnosis knowledge into the segmentation network. The experiments show that compared with commonly used majority vote, the proposed DiFF is able to segment the masks with 6% improvement on diagnosis AUC score, which also outperforms various state-of-the-art multi-rater methods by a large margin.

Via

Access Paper or Ask Questions

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Jan 08, 2022

Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich(+30 more)

Figure 1 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Figure 2 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Figure 3 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Figure 4 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Abstract:Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice - VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice - VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.

* Submitted to Medical Image Analysis

Via

Access Paper or Ask Questions

Unaligned Image-to-Image Translation by Learning to Reweight

Sep 24, 2021

Shaoan Xie, Mingming Gong, Yanwu Xu, Kun Zhang

Figure 1 for Unaligned Image-to-Image Translation by Learning to Reweight

Figure 2 for Unaligned Image-to-Image Translation by Learning to Reweight

Figure 3 for Unaligned Image-to-Image Translation by Learning to Reweight

Figure 4 for Unaligned Image-to-Image Translation by Learning to Reweight

Abstract:Unsupervised image-to-image translation aims at learning the mapping from the source to target domain without using paired images for training. An essential yet restrictive assumption for unsupervised image translation is that the two domains are aligned, e.g., for the selfie2anime task, the anime (selfie) domain must contain only anime (selfie) face images that can be translated to some images in the other domain. Collecting aligned domains can be laborious and needs lots of attention. In this paper, we consider the task of image translation between two unaligned domains, which may arise for various possible reasons. To solve this problem, we propose to select images based on importance reweighting and develop a method to learn the weights and perform translation simultaneously and automatically. We compare the proposed method with state-of-the-art image translation approaches and present qualitative and quantitative results on different tasks with unaligned domains. Extensive empirical evidence demonstrates the usefulness of the proposed problem formulation and the superiority of our method.

* ICCV2021

Via

Access Paper or Ask Questions

Progressive Hard-case Mining across Pyramid Levels in Object Detection

Sep 15, 2021

Binghong Wu, Yehui Yang, Dalu Yang, Junde Wu, Haifeng Huang, Lei Wang, Junwei Liu, Yanwu Xu

Figure 1 for Progressive Hard-case Mining across Pyramid Levels in Object Detection

Figure 2 for Progressive Hard-case Mining across Pyramid Levels in Object Detection

Figure 3 for Progressive Hard-case Mining across Pyramid Levels in Object Detection

Figure 4 for Progressive Hard-case Mining across Pyramid Levels in Object Detection

Abstract:In object detection, multi-level prediction (e.g., FPN, YOLO) and resampling skills (e.g., focal loss, ATSS) have drastically improved one-stage detector performance. However, how to improve the performance by optimizing the feature pyramid level-by-level remains unexplored. We find that, during training, the ratio of positive over negative samples varies across pyramid levels (\emph{level imbalance}), which is not addressed by current one-stage detectors. To mediate the influence of level imbalance, we propose a Unified Multi-level Optimization Paradigm (UMOP) consisting of two components: 1) an independent classification loss supervising each pyramid level with individual resampling considerations; 2) a progressive hard-case mining loss defining all losses across the pyramid levels without extra level-wise settings. With UMOP as a plug-and-play scheme, modern one-stage detectors can attain a ~1.5 AP improvement with fewer training iterations and no additional computation overhead. Our best model achieves 55.1 AP on COCO test-dev. Code is available at https://github.com/zimoqingfeng/UMOP.

Via

Access Paper or Ask Questions

Box-Adapt: Domain-Adaptive Medical Image Segmentation using Bounding BoxSupervision

Aug 28, 2021

Yanwu Xu, Mingming Gong, Shaoan Xie, Kayhan Batmanghelich

Figure 1 for Box-Adapt: Domain-Adaptive Medical Image Segmentation using Bounding BoxSupervision

Figure 2 for Box-Adapt: Domain-Adaptive Medical Image Segmentation using Bounding BoxSupervision

Figure 3 for Box-Adapt: Domain-Adaptive Medical Image Segmentation using Bounding BoxSupervision

Figure 4 for Box-Adapt: Domain-Adaptive Medical Image Segmentation using Bounding BoxSupervision

Abstract:Deep learning has achieved remarkable success in medicalimage segmentation, but it usually requires a large numberof images labeled with fine-grained segmentation masks, andthe annotation of these masks can be very expensive andtime-consuming. Therefore, recent methods try to use un-supervised domain adaptation (UDA) methods to borrow in-formation from labeled data from other datasets (source do-mains) to a new dataset (target domain). However, due tothe absence of labels in the target domain, the performance ofUDA methods is much worse than that of the fully supervisedmethod. In this paper, we propose a weakly supervised do-main adaptation setting, in which we can partially label newdatasets with bounding boxes, which are easier and cheaperto obtain than segmentation masks. Accordingly, we proposea new weakly-supervised domain adaptation method calledBox-Adapt, which fully explores the fine-grained segmenta-tion mask in the source domain and the weak bounding boxin the target domain. Our Box-Adapt is a two-stage methodthat first performs joint training on the source and target do-mains, and then conducts self-training with the pseudo-labelsof the target domain. We demonstrate the effectiveness of ourmethod in the liver segmentation task. Weakly supervised do-main adaptation

* IJCAI Workshop on Weakly Supervised Representation Learning, 2021

Via

Access Paper or Ask Questions

Weighing Features of Lung and Heart Regions for Thoracic Disease Classification

May 26, 2021

Jiansheng Fang, Yanwu Xu, Yitian Zhao, Yuguang Yan, Junling Liu, Jiang Liu

Figure 1 for Weighing Features of Lung and Heart Regions for Thoracic Disease Classification

Figure 2 for Weighing Features of Lung and Heart Regions for Thoracic Disease Classification

Figure 3 for Weighing Features of Lung and Heart Regions for Thoracic Disease Classification

Figure 4 for Weighing Features of Lung and Heart Regions for Thoracic Disease Classification

Abstract:Chest X-rays are the most commonly available and affordable radiological examination for screening thoracic diseases. According to the domain knowledge of screening chest X-rays, the pathological information usually lay on the lung and heart regions. However, it is costly to acquire region-level annotation in practice, and model training mainly relies on image-level class labels in a weakly supervised manner, which is highly challenging for computer-aided chest X-ray screening. To address this issue, some methods have been proposed recently to identify local regions containing pathological information, which is vital for thoracic disease classification. Inspired by this, we propose a novel deep learning framework to explore discriminative information from lung and heart regions. We design a feature extractor equipped with a multi-scale attention module to learn global attention maps from global images. To exploit disease-specific cues effectively, we locate lung and heart regions containing pathological information by a well-trained pixel-wise segmentation model to generate binarization masks. By introducing element-wise logical AND operator on the learned global attention maps and the binarization masks, we obtain local attention maps in which pixels are $1$ for lung and heart region and $0$ for other regions. By zeroing features of non-lung and heart regions in attention maps, we can effectively exploit their disease-specific cues in lung and heart regions. Compared to existing methods fusing global and local features, we adopt feature weighting to avoid weakening visual cues unique to lung and heart regions. Evaluated by the benchmark split on the publicly available chest X-ray14 dataset, the comprehensive experiments show that our method achieves superior performance compared to the state-of-the-art methods.

* 17 pages, 4 figures, BMC Medical Imaging

Via

Access Paper or Ask Questions