Alert button
Picture for Boris Shirokikh

Boris Shirokikh

Alert button

Redesigning Out-of-Distribution Detection on 3D Medical Images

Aug 07, 2023
Anton Vasiliuk, Daria Frolova, Mikhail Belyaev, Boris Shirokikh

Figure 1 for Redesigning Out-of-Distribution Detection on 3D Medical Images
Figure 2 for Redesigning Out-of-Distribution Detection on 3D Medical Images
Figure 3 for Redesigning Out-of-Distribution Detection on 3D Medical Images
Figure 4 for Redesigning Out-of-Distribution Detection on 3D Medical Images

Detecting out-of-distribution (OOD) samples for trusted medical image segmentation remains a significant challenge. The critical issue here is the lack of a strict definition of abnormal data, which often results in artificial problem settings without measurable clinical impact. In this paper, we redesign the OOD detection problem according to the specifics of volumetric medical imaging and related downstream tasks (e.g., segmentation). We propose using the downstream model's performance as a pseudometric between images to define abnormal samples. This approach enables us to weigh different samples based on their performance impact without an explicit ID/OOD distinction. We incorporate this weighting in a new metric called Expected Performance Drop (EPD). EPD is our core contribution to the new problem design, allowing us to rank methods based on their clinical impact. We demonstrate the effectiveness of EPD-based evaluation in 11 CT and MRI OOD detection challenges.

Viaarxiv icon

Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation

Jun 23, 2023
Anton Vasiliuk, Daria Frolova, Mikhail Belyaev, Boris Shirokikh

Figure 1 for Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation
Figure 2 for Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation
Figure 3 for Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation
Figure 4 for Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation

Deep Learning models perform unreliably when the data comes from a distribution different from the training one. In critical applications such as medical imaging, out-of-distribution (OOD) detection methods help to identify such data samples, preventing erroneous predictions. In this paper, we further investigate the OOD detection effectiveness when applied to 3D medical image segmentation. We design several OOD challenges representing clinically occurring cases and show that none of these methods achieve acceptable performance. Methods not dedicated to segmentation severely fail to perform in the designed setups; their best mean false positive rate at 95% true positive rate (FPR) is 0.59. Segmentation-dedicated ones still achieve suboptimal performance, with the best mean FPR of 0.31 (lower is better). To indicate this suboptimality, we develop a simple method called Intensity Histogram Features (IHF), which performs comparable or better in the same challenges, with a mean FPR of 0.25. Our findings highlight the limitations of the existing OOD detection methods on 3D medical images and present a promising avenue for improving them. To facilitate research in this area, we release the designed challenges as a publicly available benchmark and formulate practical criteria to test the OOD detection generalization beyond the suggested benchmark. We also propose IHF as a solid baseline to contest the emerging methods.

* This work has been submitted to the IEEE for possible publication. 10 pages, 5 figures, 5 tables 
Viaarxiv icon

Solving Sample-Level Out-of-Distribution Detection on 3D Medical Images

Dec 13, 2022
Daria Frolova, Anton Vasiliuk, Mikhail Belyaev, Boris Shirokikh

Figure 1 for Solving Sample-Level Out-of-Distribution Detection on 3D Medical Images
Figure 2 for Solving Sample-Level Out-of-Distribution Detection on 3D Medical Images
Figure 3 for Solving Sample-Level Out-of-Distribution Detection on 3D Medical Images
Figure 4 for Solving Sample-Level Out-of-Distribution Detection on 3D Medical Images

Deep Learning (DL) models tend to perform poorly when the data comes from a distribution different from the training one. In critical applications such as medical imaging, out-of-distribution (OOD) detection helps to identify such data samples, increasing the model's reliability. Recent works have developed DL-based OOD detection that achieves promising results on 2D medical images. However, scaling most of these approaches on 3D images is computationally intractable. Furthermore, the current 3D solutions struggle to achieve acceptable results in detecting even synthetic OOD samples. Such limited performance might indicate that DL often inefficiently embeds large volumetric images. We argue that using the intensity histogram of the original CT or MRI scan as embedding is descriptive enough to run OOD detection. Therefore, we propose a histogram-based method that requires no DL and achieves almost perfect results in this domain. Our proposal is supported two-fold. We evaluate the performance on the publicly available datasets, where our method scores 1.0 AUROC in most setups. And we score second in the Medical Out-of-Distribution challenge without fine-tuning and exploiting task-specific knowledge. Carefully discussing the limitations, we conclude that our method solves the sample-level OOD detection on 3D medical images in the current setting.

* 20 pages, 3 figures, submitted to Computerized Medical Imaging and Graphics 
Viaarxiv icon

Exploring Structure-Wise Uncertainty for 3D Medical Image Segmentation

Nov 01, 2022
Anton Vasiliuk, Daria Frolova, Mikhail Belyaev, Boris Shirokikh

Figure 1 for Exploring Structure-Wise Uncertainty for 3D Medical Image Segmentation
Figure 2 for Exploring Structure-Wise Uncertainty for 3D Medical Image Segmentation
Figure 3 for Exploring Structure-Wise Uncertainty for 3D Medical Image Segmentation
Figure 4 for Exploring Structure-Wise Uncertainty for 3D Medical Image Segmentation

When applying a Deep Learning model to medical images, it is crucial to estimate the model uncertainty. Voxel-wise uncertainty is a useful visual marker for human experts and could be used to improve the model's voxel-wise output, such as segmentation. Moreover, uncertainty provides a solid foundation for out-of-distribution (OOD) detection, improving the model performance on the image-wise level. However, one of the frequent tasks in medical imaging is the segmentation of distinct, local structures such as tumors or lesions. Here, the structure-wise uncertainty allows more precise operations than image-wise and more semantic-aware than voxel-wise. The way to produce uncertainty for individual structures remains poorly explored. We propose a framework to measure the structure-wise uncertainty and evaluate the impact of OOD data on the model performance. Thus, we identify the best UE method to improve the segmentation quality. The proposed framework is tested on three datasets with the tumor segmentation task: LIDC-IDRI, LiTS, and a private one with multiple brain metastases cases.

Viaarxiv icon

Neglectable effect of brain MRI data prepreprocessing for tumor segmentation

Apr 11, 2022
Ekaterina Kondrateva, Polina Druzhinina, Alexandra Dalechina, Boris Shirokikh, Mikhail Belyaev, Anvar Kurmukov

Figure 1 for Neglectable effect of brain MRI data prepreprocessing for tumor segmentation
Figure 2 for Neglectable effect of brain MRI data prepreprocessing for tumor segmentation
Figure 3 for Neglectable effect of brain MRI data prepreprocessing for tumor segmentation
Figure 4 for Neglectable effect of brain MRI data prepreprocessing for tumor segmentation

Magnetic resonance imaging (MRI) data is heterogeneous due to the differences in device manufacturers, scanning protocols, and inter-subject variability. A conventional way to mitigate MR image heterogeneity is to apply preprocessing transformations, such as anatomy alignment, voxel resampling, signal intensity equalization, image denoising, and localization of regions of interest (ROI). Although preprocessing pipeline standardizes image appearance, its influence on the quality of image segmentation and other downstream tasks on deep neural networks (DNN) has never been rigorously studied. Here we report a comprehensive study of multimodal MRI brain cancer image segmentation on TCIA-GBM open-source dataset. Our results demonstrate that most popular standardization steps add no value to artificial neural network performance; moreover, preprocessing can hamper model performance. We suggest that image intensity normalization approaches do not contribute to model accuracy because of the reduction of signal variance with image standardization. Finally, we show the contribution of scull-stripping in data preprocessing is almost negligible if measured in terms of clinically relevant metrics. We show that the only essential transformation for accurate analysis is the unification of voxel spacing across the dataset. In contrast, anatomy alignment in form of non-rigid atlas registration is not necessary and most intensity equalization steps do not improve model productiveness.

Viaarxiv icon

Adaptation to CT Reconstruction Kernels by Enforcing Cross-domain Feature Maps Consistency

Mar 28, 2022
Stanislav Shimovolos, Andrey Shushko, Mikhail Belyaev, Boris Shirokikh

Figure 1 for Adaptation to CT Reconstruction Kernels by Enforcing Cross-domain Feature Maps Consistency
Figure 2 for Adaptation to CT Reconstruction Kernels by Enforcing Cross-domain Feature Maps Consistency
Figure 3 for Adaptation to CT Reconstruction Kernels by Enforcing Cross-domain Feature Maps Consistency
Figure 4 for Adaptation to CT Reconstruction Kernels by Enforcing Cross-domain Feature Maps Consistency

Deep learning methods provide significant assistance in analyzing coronavirus disease (COVID-19) in chest computed tomography (CT) images, including identification, severity assessment, and segmentation. Although the earlier developed methods address the lack of data and specific annotations, the current goal is to build a robust algorithm for clinical use, having a larger pool of available data. With the larger datasets, the domain shift problem arises, affecting the performance of methods on the unseen data. One of the critical sources of domain shift in CT images is the difference in reconstruction kernels used to generate images from the raw data (sinograms). In this paper, we show a decrease in the COVID-19 segmentation quality of the model trained on the smooth and tested on the sharp reconstruction kernels. Furthermore, we compare several domain adaptation approaches to tackle the problem, such as task-specific augmentation and unsupervised adversarial learning. Finally, we propose the unsupervised adaptation method, called F-Consistency, that outperforms the previous approaches. Our method exploits a set of unlabeled CT image pairs which differ only in reconstruction kernels within every pair. It enforces the similarity of the network hidden representations (feature maps) by minimizing mean squared error (MSE) between paired feature maps. We show our method achieving 0.64 Dice Score on the test dataset with unseen sharp kernels, compared to the 0.56 Dice Score of the baseline model. Moreover, F-Consistency scores 0.80 Dice Score between predictions on the paired images, which almost doubles the baseline score of 0.46 and surpasses the other methods. We also show F-Consistency to better generalize on the unseen kernels and without the specific semantic content, e.g., presence of the COVID-19 lesions.

Viaarxiv icon

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Jan 08, 2022
Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen, Hao Li, Han Liu, Buntheng Ly, Ipek Oguz, Hyungseob Shin, Boris Shirokikh, Zixian Su, Guotai Wang, Jianghao Wu, Yanwu Xu, Kai Yao, Li Zhang, Sebastien Ourselin, Jonathan Shapey, Tom Vercauteren

Figure 1 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation
Figure 2 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation
Figure 3 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation
Figure 4 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice - VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice - VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.

* Submitted to Medical Image Analysis 
Viaarxiv icon

Systematic Clinical Evaluation of A Deep Learning Method for Medical Image Segmentation: Radiosurgery Application

Aug 21, 2021
Boris Shirokikh, Alexandra Dalechina, Alexey Shevtsov, Egor Krivov, Valery Kostjuchenko, Amayak Durgaryan, Mikhail Galkin, Andrey Golanov, Mikhail Belyaev

Figure 1 for Systematic Clinical Evaluation of A Deep Learning Method for Medical Image Segmentation: Radiosurgery Application
Figure 2 for Systematic Clinical Evaluation of A Deep Learning Method for Medical Image Segmentation: Radiosurgery Application
Figure 3 for Systematic Clinical Evaluation of A Deep Learning Method for Medical Image Segmentation: Radiosurgery Application
Figure 4 for Systematic Clinical Evaluation of A Deep Learning Method for Medical Image Segmentation: Radiosurgery Application

We systematically evaluate a Deep Learning (DL) method in a 3D medical image segmentation task. Our segmentation method is integrated into the radiosurgery treatment process and directly impacts the clinical workflow. With our method, we address the relative drawbacks of manual segmentation: high inter-rater contouring variability and high time consumption of the contouring process. The main extension over the existing evaluations is the careful and detailed analysis that could be further generalized on other medical image segmentation tasks. Firstly, we analyze the changes in the inter-rater detection agreement. We show that the segmentation model reduces the ratio of detection disagreements from 0.162 to 0.085 (p < 0.05). Secondly, we show that the model improves the inter-rater contouring agreement from 0.845 to 0.871 surface Dice Score (p < 0.05). Thirdly, we show that the model accelerates the delineation process in between 1.6 and 2.0 times (p < 0.05). Finally, we design the setup of the clinical experiment to either exclude or estimate the evaluation biases, thus preserve the significance of the results. Besides the clinical evaluation, we also summarize the intuitions and practical ideas for building an efficient DL-based model for 3D medical image segmentation.

Viaarxiv icon

Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation

Jul 10, 2021
Ivan Zakazov, Boris Shirokikh, Alexey Chernyavskiy, Mikhail Belyaev

Figure 1 for Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation
Figure 2 for Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation
Figure 3 for Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation
Figure 4 for Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation

Domain Adaptation (DA) methods are widely used in medical image segmentation tasks to tackle the problem of differently distributed train (source) and test (target) data. We consider the supervised DA task with a limited number of annotated samples from the target domain. It corresponds to one of the most relevant clinical setups: building a sufficiently accurate model on the minimum possible amount of annotated data. Existing methods mostly fine-tune specific layers of the pretrained Convolutional Neural Network (CNN). However, there is no consensus on which layers are better to fine-tune, e.g. the first layers for images with low-level domain shift or the deeper layers for images with high-level domain shift. To this end, we propose SpotTUnet - a CNN architecture that automatically chooses the layers which should be optimally fine-tuned. More specifically, on the target domain, our method additionally learns the policy that indicates whether a specific layer should be fine-tuned or reused from the pretrained network. We show that our method performs at the same level as the best of the nonflexible fine-tuning methods even under the extreme scarcity of annotated data. Secondly, we show that SpotTUnet policy provides a layer-wise visualization of the domain shift impact on the network, which could be further used to develop robust domain generalization methods. In order to extensively evaluate SpotTUnet performance, we use a publicly available dataset of brain MR images (CC359), characterized by explicit domain shift. We release a reproducible experimental pipeline.

* Accepted for MICCAI-2021 conference 
Viaarxiv icon

First U-Net Layers Contain More Domain Specific Information Than The Last Ones

Aug 17, 2020
Boris Shirokikh, Ivan Zakazov, Alexey Chernyavskiy, Irina Fedulova, Mikhail Belyaev

Figure 1 for First U-Net Layers Contain More Domain Specific Information Than The Last Ones
Figure 2 for First U-Net Layers Contain More Domain Specific Information Than The Last Ones
Figure 3 for First U-Net Layers Contain More Domain Specific Information Than The Last Ones
Figure 4 for First U-Net Layers Contain More Domain Specific Information Than The Last Ones

MRI scans appearance significantly depends on scanning protocols and, consequently, the data-collection institution. These variations between clinical sites result in dramatic drops of CNN segmentation quality on unseen domains. Many of the recently proposed MRI domain adaptation methods operate with the last CNN layers to suppress domain shift. At the same time, the core manifestation of MRI variability is a considerable diversity of image intensities. We hypothesize that these differences can be eliminated by modifying the first layers rather than the last ones. To validate this simple idea, we conducted a set of experiments with brain MRI scans from six domains. Our results demonstrate that 1) domain-shift may deteriorate the quality even for a simple brain extraction segmentation task (surface Dice Score drops from 0.85-0.89 even to 0.09); 2) fine-tuning of the first layers significantly outperforms fine-tuning of the last layers in almost all supervised domain adaptation setups. Moreover, fine-tuning of the first layers is a better strategy than fine-tuning of the whole network, if the amount of annotated data from the new domain is strictly limited.

* Accepted to DART workshop at MICCAI-2020 
Viaarxiv icon