Multi-modality cardiac imaging plays a key role in the management of patients with cardiovascular diseases. It allows a combination of complementary anatomical, morphological and functional information, increases diagnosis accuracy, and improves the efficacy of cardiovascular interventions and clinical outcomes. Fully-automated processing and quantitative analysis of multi-modality cardiac images could have a direct impact on clinical research and evidence-based patient management. However, these require overcoming significant challenges including inter-modality misalignment and finding optimal methods to integrate information from different modalities. This paper aims to provide a comprehensive review of multi-modality imaging in cardiology, the computing methods, the validation strategies, the related clinical workflows and future perspectives. For the computing methodologies, we have a favored focus on the three tasks, i.e., registration, fusion and segmentation, which generally involve multi-modality imaging data, \textit{either combining information from different modalities or transferring information across modalities}. The review highlights that multi-modality cardiac imaging data has the potential of wide applicability in the clinic, such as trans-aortic valve implantation guidance, myocardial viability assessment, and catheter ablation therapy and its patient selection. Nevertheless, many challenges remain unsolved, such as missing modality, combination of imaging and non-imaging data, and uniform analysis and representation of different modalities. There is also work to do in defining how the well-developed techniques fit in clinical workflows and how much additional and relevant information they introduce. These problems are likely to continue to be an active field of research and the questions to be answered in the future.
Partially-supervised learning can be challenging for segmentation due to the lack of supervision for unlabeled structures, and the methods directly applying fully-supervised learning could lead to incompatibility, meaning ground truth is not in the solution set of the optimization problem given the loss function. To address the challenge, we propose a deep compatible learning (DCL) framework, which trains a single multi-label segmentation network using images with only partial structures annotated. We first formulate the partially-supervised segmentation as an optimization problem compatible with missing labels, and prove its compatibility. Then, we equip the model with a conditional segmentation strategy, to propagate labels from multiple partially-annotated images to the target. Additionally, we propose a dual learning strategy, which learns two opposite mappings of label propagation simultaneously, to provide substantial supervision for unlabeled structures. The two strategies are formulated into compatible forms, termed as conditional compatibility and dual compatibility, respectively. We show this framework is generally applicable for conventional loss functions. The approach attains significant performance improvement over existing methods, especially in the situation where only a small training dataset is available. Results on three segmentation tasks have shown that the proposed framework could achieve performance matching fully-supervised models.
Distributed learning has shown great potential in medical image analysis. It allows to use multi-center training data with privacy protection. However, data distributions in local centers can vary from each other due to different imaging vendors, and annotation protocols. Such variation degrades the performance of learning-based methods. To mitigate the influence, two groups of methods have been proposed for different aims, i.e., the global methods and the personalized methods. The former are aimed to improve the performance of a single global model for all test data from unseen centers (known as generic data); while the latter target multiple models for each center (denoted as local data). However, little has been researched to achieve both goals simultaneously. In this work, we propose a new framework of distributed learning that bridges the gap between two groups, and improves the performance for both generic and local data. Specifically, our method decouples the predictions for generic data and local data, via distribution-conditioned adaptation matrices. Results on multi-center left atrial (LA) MRI segmentation showed that our method demonstrated superior performance over existing methods on both generic and local data. Our code is available at https://github.com/key1589745/decouple_predict
Although supervised deep-learning has achieved promising performance in medical image segmentation, many methods cannot generalize well on unseen data, limiting their real-world applicability. To address this problem, we propose a deep learning-based Bayesian framework, which jointly models image and label statistics, utilizing the domain-irrelevant contour of a medical image for segmentation. Specifically, we first decompose an image into components of contour and basis. Then, we model the expected label as a variable only related to the contour. Finally, we develop a variational Bayesian framework to infer the posterior distributions of these variables, including the contour, the basis, and the label. The framework is implemented with neural networks, thus is referred to as deep Bayesian segmentation. Results on the task of cross-sequence cardiac MRI segmentation show that our method set a new state of the art for model generalizability. Particularly, the BayeSeg model trained with LGE MRI generalized well on T2 images and outperformed other models with great margins, i.e., over 0.47 in terms of average Dice. Our code is available at https://zmiclab.github.io/projects.html.
Previous methods on multimodal groupwise registration typically require certain highly specialized similarity metrics with restrained applicability. In this work, we instead propose a general framework which formulates groupwise registration as a procedure of hierarchical Bayesian inference. Here, the imaging process of multimodal medical images, including shape transition and appearance variation, is characterized by a disentangled variational auto-encoder. To this end, we propose a novel variational posterior and network architecture that facilitate joint learning of the common structural representation and the desired spatial correspondences. The performance of the proposed model was validated on two publicly available multimodal datasets, i.e., BrainWeb and MS-CMR of the heart. Results have demonstrated the efficacy of our framework in realizing multimodal groupwise registration in an end-to-end fashion.
Cardiac segmentation is an essential step for the diagnosis of cardiovascular diseases. However, pixel-wise dense labeling is both costly and time-consuming. Scribble, as a form of sparse annotation, is more accessible than full annotations. However, it's particularly challenging to train a segmentation network with weak supervision from scribbles. To tackle this problem, we propose a new scribble-guided method for cardiac segmentation, based on the Positive-Unlabeled (PU) learning framework and global consistency regularization, and termed as ShapePU. To leverage unlabeled pixels via PU learning, we first present an Expectation-Maximization (EM) algorithm to estimate the proportion of each class in the unlabeled pixels. Given the estimated ratios, we then introduce the marginal probability maximization to identify the classes of unlabeled pixels. To exploit shape knowledge, we apply cutout operations to training images, and penalize the inconsistent segmentation results. Evaluated on two open datasets, i.e, ACDC and MSCMRseg, our scribble-supervised ShapePU surpassed the fully supervised approach respectively by 1.4% and 9.8% in average Dice, and outperformed the state-of-the-art weakly supervised and PU learning methods by large margins. Our code is available at https://github.com/BWGZK/ShapePU.
Modeling statistics of image priors is useful for image super-resolution, but little attention has been paid from the massive works of deep learning-based methods. In this work, we propose a Bayesian image restoration framework, where natural image statistics are modeled with the combination of smoothness and sparsity priors. Concretely, firstly we consider an ideal image as the sum of a smoothness component and a sparsity residual, and model real image degradation including blurring, downscaling, and noise corruption. Then, we develop a variational Bayesian approach to infer their posteriors. Finally, we implement the variational approach for single image super-resolution (SISR) using deep neural networks, and propose an unsupervised training strategy. The experiments on three image restoration tasks, \textit{i.e.,} ideal SISR, realistic SISR, and real-world SISR, demonstrate that our method has superior model generalizability against varying noise levels and degradation kernels and is effective in unsupervised SISR. The code and resulting models are released via \url{https://zmiclab.github.io/projects.html}.
Curating a large set of fully annotated training data can be costly, especially for the tasks of medical image segmentation. Scribble, a weaker form of annotation, is more obtainable in practice, but training segmentation models from limited supervision of scribbles is still challenging. To address the difficulties, we propose a new framework for scribble learning-based medical image segmentation, which is composed of mix augmentation and cycle consistency and thus is referred to as CycleMix. For augmentation of supervision, CycleMix adopts the mixup strategy with a dedicated design of random occlusion, to perform increments and decrements of scribbles. For regularization of supervision, CycleMix intensifies the training objective with consistency losses to penalize inconsistent segmentation, which results in significant improvement of segmentation performance. Results on two open datasets, i.e., ACDC and MSCMRseg, showed that the proposed method achieved exhilarating performance, demonstrating comparable or even better accuracy than the fully-supervised methods. The code and expert-made scribble annotations for MSCMRseg are publicly available at https://github.com/BWGZK/CycleMix.
Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target image; and the transformed atlas labels can be combined to generate target segmentation via label fusion schemes. Many conventional MAS methods employed the atlases from the same modality as the target image. However, the number of atlases with the same modality may be limited or even missing in many clinical applications. Besides, conventional MAS methods suffer from the computational burden of registration or label fusion procedures. In this work, we design a novel cross-modality MAS framework, which uses available atlases from a certain modality to segment a target image from another modality. To boost the computational efficiency of the framework, both the image registration and label fusion are achieved by well-designed deep neural networks. For the atlas-to-target image registration, we propose a bi-directional registration network (BiRegNet), which can efficiently align images from different modalities. For the label fusion, we design a similarity estimation network (SimNet), which estimates the fusion weight of each atlas by measuring its similarity to the target image. SimNet can learn multi-scale information for similarity estimation to improve the performance of label fusion. The proposed framework was evaluated by the left ventricle and liver segmentation tasks on the MM-WHS and CHAOS datasets, respectively. Results have shown that the framework is effective for cross-modality MAS in both registration and label fusion. The code will be released publicly on \url{https://github.com/NanYoMy/cmmas} once the manuscript is accepted.